The data you need? Musings on libraries and numbers

One of many tweets from ALA Midwinter said something like this, apparently quoting a speaker:

We need more techies with “library values” to give libraries the data we need.

That’s a paraphrase, taken out of context. I found myself thinking about it–and deciding it was worth commenting on even if my assumed context is entirely wrong. (Which it might be. Don’t point me at streaming video for the program: That’s really not the point.)

My basic thought:

There’s no shortage of library data, and there’s no shortage of people with both technical skills and library values to massage that data. What there may be a shortage of: Libraries/librarians ready to use that data–and decide what data they actually need or can/will use.

That’s a wildly overbroad statement, and I may be dead wrong. I’m basing the statement on my own experience, what I know from a couple of colleagues, and what I see or don’t see in the library conversations and literature. (Well, I don’t see much of the library literature these days, at least not the literature that’s behind paywalls.)

The data

There’s plenty of data. IMLS does a first-rate job of gathering and reporting fairly detailed figures on some 9,000 public libraries on an annual basis. IMLS does its own reports based on that data–but it also makes the datasets freely available.

Pro tip: If you want to massage IMLS data and don’t have or know Access, download and unzip the Access version anyway: Excel can open the Access database once you tell it to do so–that is, once you use the Open file pull-down menu and select “Access databases” from the list–and once you convert the whole thing to a table, it works nicely as a humongous spreadsheet. Then you can select the columns you actually need, making it a lot more workable. Do read the documentation. Unless you’re much cleverer than I am, I wouldn’t mess with the flat file: Access-via-Excel is a lot easier. If you’re an Access guru, of course, you can ignore that.

NCES does the same for more than 3,000 academic libraries, although only once every two years. Same pro tip applies. NCES even allows you to do some “compare library X with comparable institutions” on the fly. (If the columns and documentation for the NCES academic library tables and the IMLS public library tables have some vague similarities…NCES used to do the public library tables as well.

There are other sources, to be sure, but these are the biggest. (A couple of ALA divisions produce sets of numbers for partial sets of libraries…for a price. I haven’t looked at those: See “for a price.”)

Does your library work with those numbers at all–other than to report your own stats, that is?

The data you need

There’s the rub. NCES and IMLS provide impressive, readily-operable sources of raw data. But it’s probably not the data you need.

What is the data you need? More to the point, what data will you pay attention to, use, pay for (that is: pay to have massaged into the form you want and written up so you find it meaningful)?

I’d love to have answers to that question, and I suspect those answers differ by type of library and subtypes within types. (For that matter, defining a subtype is tricky…)

I’ve done some work with both data sources, partly out of curiosity, partly out of contrarian stubbornness, partly pursuing ideas I thought could be broadly useful. For example:

  • I was convinced that the “public libraries are closing all over the place” meme, at least for the United States, was not only harmful as a self-fulfilling prophecy (“if everybody else is giving up on them, why should our town keep funding ours?”) but was probably false. It was and is. I proved that in the April 2012 and May 2012 Cites & Insights (with a 2010 update in September 2012). As far as I can tell, that proof has had very little impact in the profession. (A couple of blogs linked to it.)
  • I prepared the book Give Us a Dollar and We’ll Give You Back Four (2012-13), based on the IMLS database and designed to help public libraries–specifically smaller libraries without their own research departments or big consulting contracts–prepare their cost-benefit story to help gain or at least retain budgets. I deliberately priced it modestly–it’s currently $9.99 for Kindle or PDF e-version and much cheaper than most library books in paperback or hardcover versions–so that even the smallest libraries could afford it. I made it as easy as possible for libraries to get their relevant data points from me if they didn’t have them handy. While the book hasn’t been an utter failure, it also hasn’t been the kind of success–so far–that would encourage doing a new, leaner, more graphic version next year: To date, 67 copies have sold. Of those, 6 libraries have asked for their data. (But that’s OK: Maybe every library keeps those figures handy.) It’s quite possible–even probable–that I just haven’t figured out how to make this data meaningful; unfortunately, there’s been little feedback to help.
  • To see how graphs could improve the story, I did Graphing Public Library Benefits, e-only, originally $9.99, now $4.00. Care to guess how many copies of that I’ve sold? Zero.
  • A colleague has an outstanding track record in working with library data and making it accessible. He has a PhD. He’s now working in other library areas because he couldn’t find a paying job working with library data. I would quote him on library interest in longitudinal data–time series, showing how things change–but that would just be depressing.
  • For years, Tom Hennen produced HAPLR, Hennen’s American Public Library Ratings, and offered very inexpensive “group comparison” reports for individual public libraries. For whatever reason, HAPLR seems to have ceased–the most recent report is either two or three years old.

The data you need, redux

Most national reports deal with averages over time–and while the “over time” part is vital, averages vastly oversimplify the library picture. Sometimes, I believe averages are actually harmful; mostly, I believe they’re not very useful.

That will be the underlying theme of an upcoming article in Cites & Insights, I think–one that was planned for the March issue, until I became contrarily interested in another meme, the “fact” that academic library circulation (as opposed to e-usage) has been dropping all over the place and continue to fall in all or nearly all academic libraries.

I already knew “all” was nonsense. I assumed “nearly all” was right, but began to wonder what “nearly” actually meant. Did 1% of academic libraries have steady or increasing circulation? Five percent? Ten percent–as unlikely as that seems?

So I set aside the “trouble with averages in public library data” article–which I hadn’t actually started writing yet–and spent some time looking at academic library circulation and circulation per capita, first comparing FY2008 and FY2010 (the most recent available), then going back ty FY2006.

The results will make up most of the March 2013 Cites & Insights, when I publish that issue, and without offering too many spoilers let’s just say that ten percent is wrong–but not the way you might expect.

I could rush that issue out, as early as the end of this week or early next week, if I thought it would be received well and used broadly. At this point, I have no reason to believe that’s true.

What would, I think, be interesting is to see whether there are reasonable predictors of continued healthy circulation in academic libraries–what other factors appear to correlate well with, let’s say, traditional library use. But that’s a significant project. Even at my “pretty much retired, enjoy doing this, so can charge much lower fees than any proper consultant” rates, it would almost certainly be a four-digit job.

Similarly, I’d love to do some time-based analyses of public library performance within groups: Not averages, but percentages and correlations. Not to find “stars” (LJ has that down pat) but to help libraries see where they are and where they could be. And to help tell the complex story, not of The Average Academic Library or The Average Public Library but of the thousands of real, varied, diverse, actual libraries.

Here’s the thing: I don’t know whether I’m asking the right questions. I don’t know whether there is analysis that would be worth doing. I don’t know whether I can find the ways to make those facts meaningful and useful to librarians.

And I don’t know whether librarians are willing to deal with data at all–to work with the results, to go beyond the level of analysis I can do and make it effective for local use.

I wonder how many public and academic librarians really get, say, the difference between overall averages (e.g., circulation per capita for the U.S. public libraries), institutional averages (e.g., the average library circulation per capita–that is not the same figure) and median figures (e.g., the point at which half of libraries circulate more per capita and half less). I wonder how many understand at a gut level that many (maybe most) real-world statistics don’t follow the neat bell shape curve or the not-so-neat power-law curve–and why that matters.

Do LIS students get some training in real-world statistics (“statistics” may be too fancy a word; this is mostly pretty low-level stuff)? Is there a good book for them to use once they’re out in the real world? Would there be a real market for such a book if it existed? (Say a title like The Mythical Average Library: Dealing with Library Statistics)

Wish I knew the answers. Wish I knew whether I had a useful and possibly mildly remunerative role to play in providing answers. (There are certainly agencies that do yeoman work here–Colorado’s Library Research Service for one. I’m not faulting those agencies.)

Feedback invited. Please. Here or as email to

Modified later on January 29 to reduce the whininess and try to make it less about needing to be paid and more about whether this stuff’s worthwhile in general. Which may or may not help.

3 Responses to “The data you need? Musings on libraries and numbers”

  1. Stephen Michael Kellat says:

    Matt Asay wrote at The Register once upon a time that there is tons of data out there but few interpret it. You’re doing the heavy lifting of interpretation but few are paying attention. THAT is a problem.

    I wish I had a solution for fixing it. You need to be front and center providing data interpretation and be very well compensated for speaking truth to those who must hear it.

  2. waltcrawford says:

    Thanks for that, Stephen–but while I think I’m good at massaging the data, I still have a ways to go (at least sometimes) in figuring out how to make it meaningful and effective. It’s not easy.

  3. Emily Weak says:

    One of the reasons I started my website, Hiring Librarians, is because although there is no shortage of opinions about how to get hired by a library, there was not a lot of fact-based stuff.
    What I did is distribute surveys to hundreds of people who hire librarians. I’ve posted the results of the multiple choice questions as graphs and tables, but I think the more valuable service is presenting each anonymous response as a post on my blog.
    I think this way of presenting data has a lot of advantages over just making a dataset available, or even making a really cool graph. Posting responses over time presents the data in a way that allows it to be narrative, an individual’s opinion, while the standardized format allows it to also be data. It’s digestable, doled out a post at a time, rather than as an overwhelming spreadsheet. And ultimately the totality becomes present in each reader’s life – an understanding changes and grows over time, and my hope is that people stop mistaking one person’s opinion about hiring as fact.
    I think that the way facts become meaningful is not just when they are used once, but when they are used repeatedly, when they become incorporated into our conversations and doings. As you are doing here, by asking questions and presenting answers for people to comment on.