Misleading graphs: an anecdote

This is the kind of thing I would have posted on FriendFeed to get quick reactions from a few hundred smart library folk. Unfortunately, FriendFeed’s really gone now–and frenf.it isn’t quite there yet. (Maybe soon.) So there may be more casual posts here, although they (unfortunately) almost certainly won’t get the kind of quick, open feedback they did there.

I use Excel for my “statistical” work and to create charts. (Excel 2010 at the moment, maybe 2013 in a few months…)

One thing I’ve always liked about Excel’s graphs, at least as starting points for customization, is that they’ve been “honest”–the Y axis always begins at zero, unless there are negative numbers in the dataset.

Today, I was finishing Chapter 13 of The Open Access Landscape (yes, I’m a little ahead; the posted version will appear on May 22) and adding the “bonus graph” that only appears in the book version (if the book appears–and if it does, it now seems likely there will be some other exclusive content, but that’s another post): a stacked-bar graph showing articles by year (2011 through 2014) with segments for articles in free OA journals, articles in journals with APCs (“pay”), and articles in journals that probably have APCs but where I can’t find the amount (“unknown”).

As usual, I selected the table with my mouse, clicked on Insert, Bar graph, the stacked-bar option.

And noticed at first that the graph was a little more dramatic than I’d expected.

It didn’t take long to figure out why: Excel had used 2,400 articles as the Y axis rather than 0.

It didn’t take much longer to fix, yielding a really non-dramatic graph that happens to be accurate and not misleading.

I’m still not sure I know why Excel made this choice. It could be because, unlike all the earlier similar graphs, the range of numbers–and especially the range of “free” numbers, 98%-99% of the total (there just aren’t many APC-charging OA history journals!)–is so narrow: from 2,683 to 3,039. (The “pay” numbers range from 32 to 56.) Setting the vertical range from 2,400 to 3,200 instead of from 0 to 3,100 made the changes more obvious and made the “pay” segment at least a little visible–but it also made the graph misleading. (Charts of Dow-Jones Industrial changes in newspapers do this every day–they turn tiny little deviations into Big Dramatic Changes.)

The moral to this story? Even though Excel’s defaults are typically reasonably honest, you still need to check what’s happened.

Comments are closed.