If you’re in a library (either public or academic) and know of, and can access, “medium-size data” that regularly comes out of your ILS or other source in some semi-tabular form (comma-separated values, spreadsheet, database, table, whatever) and that could stand some analysis but is clumsy to deal with:
I could use your help. Specifically, I’d like to see the labels and a few rows of the data from such a dataset, with notes on how often it’s generated and the typical overall size. (I’m assuming that there is no identifiable borrower information in any of this: If there is, I don’t want it.)
Please either contact me (in comments or to waltcrawford@gmail.com) or send me the stuff–in some ways, comma-separated values are best, since they can’t harbor malware, they’re compact and (as far as I know) most programs can generate them. Send it as an attachment to that same email address.
If you’re one of the first three to send me something (I’ll add to this post when/if this happens), and if I’m able to use the submission to help me prepare a convincing proposal for a book (discussed below), and if the book is accepted by a real publisher…then you’d be mentioned in the acknowledgments and receive an actual physical copy of the book, autographed if you prefer. Alternatively, if this all leads to a webinar or some equivalent, you’d be mentioned in acknowledgments and I’d find some appropriate way to provide another form of thanks.
There are a lot of “Ifs” in that last paragraph, so maybe a little background will be useful.
Background
I had an idea for a book at one point–originally The Mythical Average Public Library and later Mostly Just Numbers, which has morphed to Mostly Numbers in the meantime.
I discussed the idea in this post in May 2013, actually preceded by this post in February 2013 and this post in March 2013 and, to some extent, in this post in April 2013.
Then I started working on other projects, and the less said about the current sales of those self-published books–so far at least–the better.
Along the way, I added two more brief comments on the possible project: One on June 10, 2013 and one on June 26, 2013.
Given the rousing response and dismal results of recent self-pub efforts, I’ve pretty much concluded that self-publishing this would-be book is absurd. One difference between the library-sayings and public-library-benefits projects and this one: The first was both fun and a voyage of discovery, the second was at least a voyage of discovery. This one would be trying to help librarians using some techniques I’ve “discovered” (they were there all the time, but finding them and thinking through their implications can be tricky)–without “mansplaining” or otherwise losing the whole point.
Doing something that’s inherently interesting and finding that it’s met with a collective yawn (or, rather, a collective total absence of any interest at all) is one thing. Doing something that’s mostly fairly hard work and facing a similar “Haven’t you gone away yet, old man?” response (or, rather, non-response) is quite another.
And yet, and yet, it’s not entirely easy to just give up and move on. It doesn’t help that, in the last couple of months, I’ve “discovered” a couple of additional techniques that are very powerful and not at all obvious (at least to me)–one of which probably saved me 90% of the time required to do one complex set of analyses.
So…
I don’t work in a library. I haven’t worked in a library for several decades, although I was working with a number of library statistical reports more recently–none of which I have access to any more. (None of which exist any more except possibly in some libraries as historical items…)
Having real example(s) of datasets that are potentially useful but a little cumbersome to analyze might help me decide whether this project is worth trying to sell to a publisher (or turning into a webinar or short course or something, in any case something with somebody else’s backing behind it, given the obvious quality of my own marketing efforts…).
I still plan to use the NCES academic library statistics and IMLS public library statistics as the basis for two chapters, to help librarians see how they can prepare their own specialized comparisons with relatively little effort. But adding to that a set of examples of how “advanced” spreadsheet techniques can make everyday (every month? every quarter? every year?) library analysis tasks easier and more productive…that might be worthwhile to more people.
To do that requires realistic examples. Thus my request.
Various somewhat obsolete versions of the potential book/webinar’s outline will be found in some of the linked posts.
If you can help and think it’s worthwhile, please do.
Lack of any response will also help me decide what to do, in its own way.
I’d be happy to send you something if only I could think of something to send. I’m not really sure what a “medium-sized data set” is, or what sort of data we get that would require more analysis than, say, sorting things in a spreadsheet by date of acquisition and number of circs, or whatever. So, if you can give me an example or three, I’ll see what I can do.
Hi Laura,
I don’t have examples–if I did, I wouldn’t be asking. By “medium-sized” I mean not Big Data (let’s say not over 25,000 rows at a time) and not data so small that you can just glance at it (let’s say under 100 rows at a time). Basically, it’s data that Excel can handle and that benefits from aggregation of some sort (subtotals, averages, etc.)
I guess I’m thinking of cases where you might not *require* more analysis but might *benefit* from more analysis–e.g., circs per item per year sorted by Dewey number or decade of numbers, or other factors that aren’t as straightforward.
If there aren’t such situations in a typical library, then the possible book really is sort of pointless–it would fill a need that doesn’t exist. That may be the case.
Well, one problem is that a lot of ILSes just don’t give you that much data — and/or the kinds of data that one might want an analysis of come from too many different systems and there aren’t enough of them. For instance, I’ve often heard people wishing they could analyze ILL trends as compared to circulation of items in the collection, but there’s so much apples/oranges to the data in question — and so little of it — that I don’t think you really could.