This tale began with this post, in which I note how–with the help of friends–I regained some energy and inspiration. The story continues, at absurd length (sorry!)…
Part the Second: In which the Big Project is revealed
As will be obvious to some of you (the crazies at LSW Meebo more than most), I’ve been tinkering with this one for quite some time…probably more than a year, actually.
I started keeping notes on the project in a Word document. Here’s most of that document, with some annotations (indented gray paragraphs)–and this may give you a sense of just how long and difficult this gestation has been.
Toward a Global Liblog Survey
Notes toward a stupid project that will take forever and sell maybe 50 copies…
“Forever” is only a slight exaggeration. I’m hoping that 50 copies is conservative.
- All blogs in 2005 “top 60″ study as first baseline; all blogs that meet currency criteria included in 2008.
- All blogs in 2006 “great middle” study as second baseline; all blogs that meet currency criteria included in 2008.
- All other blogs found in IWTBF “Favorite blogs” study, or LISWiki, or LISZen source list, or “tag cloud” source list, or just my own discoveries, as of 3/1/08, that match all criteria below.
“IWTBF”: Information Wants to be Free.
Criteria for preliminary inclusion
- In English
- Not clearly defined as an official library blog.
- Somehow related to library people, at least vaguely.
- Established: At least one post before January 1, 2008
- Not defunct: At least one post after August 31, 2007 (as of March 1, 2008)
- Visible: Sum of Bloglines subscriptions and Technorati “Authority” at least 9 (thus, rounds to 1.0 on Visibility scale) when tested in first two weeks of March 2008
For now, all of those criteria are for additional blogs, those not in one of the early surveys–and I’m still pondering “not defunct.” The “Established” and “Visible” criteria are firm, so that there’s some kind of starting point and so that truly “under the radar” blogs–the ones designed for a small circle of friends–can stay that way.
Currency: additional criteria for final inclusion, if done at all – omitted.
Whazzat? The single bullet point said “Current and semi-active: At least one post in two of the three months March, April, May 2008.” That’s comparable to the “active” rule for library blogs (at least one post in two of the three months March, April, May 2007). For several reasons, I concluded that it wasn’t a reasonable criterion this time around.
Blogs added to 2005/2006 lists and blogs not added
Note that some new blogs appear in more than one source. Favorites came first. “Others” came last. I believe LISZen came second and don’t remember the order of the other two.
- Favorites: 48 added.
- LISZen: 81 added
- LISWiki: 37 added
- Cloud: 9 added
- Others (wcc’s picks): 29 added.
- Total added: 204
- Not added because too new: Five (plus some “others”).
- Not added because invisible: 92 (plus some “others”).
- Not added because available but defunct: 97.
- Not added because not reachable: 57.
Adding clearly defunct and not reachable yields more than 150 defunct of about 450 candidates–about a 33% mortality rate. (Note: Mortality for the 2005-2006 group handled separately.)
At some point, the numbers don’t quite add up. That shouldn’t be surprising…
Baseline and bizarre attempt
There are now 542 blogs in the spreadsheet.
Except for a few that lack feeds, wcc’s Bloglines list includes all of them (and a few others), for 551 feeds in the Library folder.
For at least a week, I’ll track how many new posts (and updated blogs) appear in twice-a-day checks. (Note partway through: I’ll give it two weeks.)
If the number of posts seems very high, I may delete a small number of frequently-updated blogs and note them here.
Completion of stupid experiment on 540 blogs: Over two weeks, there were, on average, 221 posts per day, or 0.41 posts per blog. By comparison, the 213 blogs in the 2006 survey had an average of 104 posts per day or 0.49 posts per blog-not a convincing difference. (By comparison, the 60 blogs in the 2005 survey had an average of 55 posts per day or 0.92 posts per blog, but that was a special handpicked set of blogs.)
First assumption-that, on average, libloggers are posting less often: Not proved, and the evidence is extremely weak at best.
Doing March-May 2007 scans for some portion of the 2005/2006 blogs, both as background for TxLA…and to get some sense for whether I want to continue this nonsense.
Issues include: Should I be tracking illustrations? Should I be tracking # of posts in which links appear? To what extent do blogs allow easy tracking of length, etc? (Have a column noting exceptions?) Is this just going to be more work than can possibly be justified?
For now: Yes on illustrations. No on links. If blogs hide posts, I’m noting that and not tracking length.
Blogs deleted during 2007 scan
- Society for librarians who say m…. Reason: Just not going to do that one.
- dulemba.com: Reason: No indication of any library focus or interest; a book writer & illustrator.
- Five weeks to a social library. Reason: Hidden posts in archive, and this was a “termed” blog-active during the course, mostly for course participants.
Second Run: Blogrolls
Process: Looked at blogrolls for blogs already in list, based on:
- Front-page blogrolls (no blogrolls from links)
- Plausible length of blogroll
- Some evidence of library focus for blogroll
Scan and results
Roughly 100 blogrolls checked in early May 2008. Results:
- Added: 46 blogs (new total: 585)
- Invisible: 21+
- Defunct (no posts in 2008, or no posts in March-April, thus not included): 42+
- Official library (not obvious from name): 4+
- Too new (no 2007 posts): 4
- Not library-related at all: 15+
- General good taste-excessive obscenities or automatic soundtrack: 2
Decisions Along the Way
For now, I’m leaving in blogs with no posts in March-May 2008 if they had posts in March-May 2007 or were in one of the two earlier surveys.
I’m deleting blogs that had no posts in March-May 208 and no posts in March-May 2008 and weren’t in one of the two surveys-unless they’ve (a) been around for a long time or (b) have posts in June 2008 or later. I may need to rethink that (and some other decisions).
That’s the end of the Word memo–for now, at least. That’s also, doubtless, way too much information. Here’s what I believe is happening at this point:
2008 Metrics and Initial Text
I’m currently going through blogs, noting:
- Brief factual information for each one (the name, using the orthography in the page title if there’s a discrepancy, tagline/motto if any, who it’s by if that’s clear or if it’s a group, when it started, the crude visibility measure, up to three of the most popular categories or tags or labels if that’s easy to determine, the software used if that’s obvious, whether it’s sans or serif and noting if it’s fully justified text and if it’s an odd text/background combination, and the URL)
- Number of posts during March-May 2008 (if it’s possible to determine that)
- Total length of posts (if it’s not too difficult to determine that)
- Number of comments and number of figures (if it’s plausible to determine either or both)
- The same information for March-May 2007 if I didn’t pick it up before: I’m using a second method to get at full text of posts for some WordPress blogs with “hidden-post” archives (using page numbers).
- The general affiliation of the blogger, if that’s evident (e.g. “Academic librarian,” not “College of William and Mary”).
- In some but not all cases, a sentence about the nature of the blog. I’ll have more of an explanation for “but not all” when the project’s done–but it’s fair to say that the typical grandmother’s advice enters into it, as do various conflicts of interest.
- In a few (very few, actually) cases, a fragment of a post that I found particularly intriguing.
The raw numbers go into a spreadsheet. The text goes into a Word chapter, alphabetically by sortable blog name (which is how I’m doing the checking)–but only as the first pass of a multipass textual process.
This is not a fast process–but having a two-display setup (the cheap way, because my new “desktop PC” is actually a notebook, so there’s an automatic second-display support for my retained LCD display) helps a lot! When I’m doing this, there are four active windows, and three of them (two Word, one Firefox) need to be nearly full-screen size. (The Excel window is wide but only five rows tall.)
How fast is not fast? It can take anywhere from 45 minutes to 1.5 hours to go through five blogs, and I try to do five at a time. There’s a two-minute (or so) setup process, but I find that doing more than ten at a time rarely works well. Some days I do five, some ten, some (rarely) fifteen…and some none at all, because I’m entirely focused on other things.
As of this writing, I’ve done 295 of 583 (there was a duplicate in that “584″ count)–but that turns out to be 289 of 577, because I’ve deleted seven blogs along the way, typically because they’ve disappeared entirely and weren’t in an earlier study or because they’re defunct and were alive for too short a period to be included (e.g., your typical “create a blog for class” blog).
So I’m just barely halfway through. If I average five blogs a day from here on out, I should be done with this phase around the end of September. If I average ten blogs a day, I’d be done in early September. My current target–taking into account Cites & Insights, columns, mental health, maybe a short vacation–is 50 blogs a week, which should get me through the whole list right around the time I turn 63…
But wait! There’s more!
At that point, depending on various factors (phase of the moon, feedback, offers of support, health, what have you), I could do another “additions” pass–picking up more English-language liblogs that seem to fit the general criteria, probably by working from blogrolls again. In saner moments, I say this won’t happen. If it does, of course, then there’s the metrics process for each of those blogs…and, since 2007 metrics would also be needed, I figure 1.5 to two hours for each fivesome.
I might also do a “subtractions” pass. Maybe the non-English blogs in the 2006 survey should be deleted. Maybe there are other categories that should be deleted… But at some point I’ll have a “complete” spreadsheet matched with a set of chapters.
After all the metrics gathering is done, comes the analysis. Lots of analysis.
How much and what kind of analysis? I’m not quite sure.
I am sure I’ll look at averages, medians, standard deviations, outliers and quintiles for each significant metric–and that “significant metrics” will include the changes from 2007 to 2008, for those blogs with posts in both quarters.
I suspect I’ll do some correlations–and I’m sure I won’t do the “toss everything into SASS and see what significant correlations emerge” style of correlation. (I don’t have access to SASS, for one thing, and I’m acutely aware that statistical correlation does not imply causation or, in fact, significant correlation.)
Wrapping it all up
Then I’ll write the manuscript–several chapters of analysis (how many I don’t yet know), followed by the alphabetic chapters, each of which will require a rewrite (for example, filling in pieces that emerge from overall analysis).
And then I’ll produce it–probably as a book, possibly with a few overall comments here or in C&I.
When? I honestly have no idea. If I manage to get it out before ALA Midwinter Meeting 2009, I’ll be fairly happy.
Now, if someone was to come forward with some form of adequate sponsorship, I’d be delighted to make a PDF version free, or to run major amounts of the analysis in Cites & Insights. Otherwise, not so likely.
Thus endeth Part the Second. Now, off to do today’s five or ten blogs. Where am I? Well, there’s one letter that begins the names of one out of every five libblogs, and it’s right in the middle of the alphabet. So, “where the L am I?” answers itself.