Correlations (Liblog Landscape 2007-2008, 7)

What do, CogSci Librarian, Weibel Lines, Marlene’s corner, etc., The Misadventures of Super_Librarian, and info NeoGnostic have in common?

The Liblog Landscape 2007-2008: Introductory Offer

You’ll find the answer and more in The Liblog Landscape 2007-2008: A Lateral Look.

This 285-page 6×9 trade paperback looks at 607 liblogs (nearly all English-language) and, for most of them, how they’ve changed from 2007 to 2008.

From now through January 15, 2009, and only from Lulu, The Liblog Landscape 2007-2008 is available for $22.50 plus shipping.

On January 16 or thereabouts, that price will go up to $35.00. If and when the book is available on Amazon, it will immediately sell for $35.00.

Correlations (chapter 7)

When I was working on this study, colleagues offered a few suggestions on possible correlations–e.g., older liblogs might show larger decreases in posts than newer ones.

This chapter looks at a few dozen possible correlations between pairs of metrics, normalizing metrics and using Excel’s CORREL function (which appears to be identical to the PEARSON function, calculating Pearson’s product-moment coefficient, the only readily available measure of correlation between two sets of numbers that I could find).

For those cases where the correlation is medium (between 0.3 and 0.5 or between -0.3 and -0.5) or strong (greater than 0.5 or less than -0.5), I note the correlation and include a scatterplot for the two values.

Statistical extremists sometimes discuss weak correlations–those below 0.3. Fact is, almost any two sets of numbers will show some correlation (that is, will have a Pearson’s product-moment coefficient greater than 0.000)–but I see no reason to believe that weak correlations mean anything at all, other than that you’re comparing two sets of numbers. So I do note some of the weak correlations, mostly to say that there’s no significant correlation between the two metrics.

Oh, as to the age suggestion? I found no useful correlation between age of blogs and any other metric.

A couple of notes about figures in this book

The Liblog Landscape 2007-2008 includes quite a few line graphs and a few scatterplots. In all cases, I used Excel2007′s graphing functions and tuned the results. Most graphs and plots represent more than 400 data points. The only graphs and plots that use non-zero baselines are those dealing with percentages, where the baseline is -100% due to the nature of the data.

Purists may object that the graphs and plots are chartjunk for either of two reasons:

  • In most cases, the axes–while showing numbers–aren’t labeled (that is, there are no words below or to the side of the axes).
  • In some cases, one or both axes are logarithmic rather than linear.

Dealing with the second case first, I believe logarithmic axes are chartjunk only if there are no numbers on the axis. When you see evenly-spaced marks numbered “1 10 100 1,000″ you’re dealing with a logarithmic axis–and I don’t believe that’s deceptive. Some sets of data simply require logarithmic charting to display meaningfully–and some data is logarithmic in character (to throw in a little philosophy). (Nearly all audio performance graphs are logarithmic in most scales–frequency, distortion percentage, power–simply because sound has logarithmic characteristics.)

The first one’s simple enough. In most cases, it didn’t make sense to label the horizontal axis but not the vertical axis–and there’s a clear issue with labeling the vertical axis. That issue could be stated as “26 picas” or “4 1/3 inches.” Either way, it’s the width available between the margins of a typical 6×9″ book: The width of the text block. Make that block wider, and you either have problems with the binding margin or have too-narrow outer margins.

26 picas is a nearly ideal width for 11point or 12point text–within the 55 to 65-character range usually regarded as optimal for reading. But it’s a little narrow for a graph with a lot of information…particularly after you add numeric labels for the vertical axis and a little white space between the graph and its border. That narrows the graph area to at most four inches and more typically around 3.5 inches.

What happens when you add a vertical axis label? You lose another half inch or more.

I found that graphs were consistently squeezed too tight as a result–they became even harder to interpret.

In the end, I eliminated most axis labels, stating them in the text that precedes or follows each graph instead. It was simply a tradeoff of proper graph presentation standards versus graph readability. (The other alternative–going to 8.5×11 for the book, with a 6″ text block–is great for graphs but problematic for everything else.)

Who’s here, part 7

Fifty more blogs with the number of index entries for each one–noting, once again, that some of the most interesting and worthwhile blogs have only one index entry each, because this is a quantitative study, not a qualitative one.


You’ll find the answer on pages 98-99.

Comments are closed.

This blog is protected by dr Dave\\\\\\\'s Spam Karma 2: 103069 Spams eaten and counting...