Archive for April, 2017

GOAJ: April 2017 update

Sunday, April 30th, 2017

It’s April 30–the last day of the month, when I fetch usage statistics for my websites (as always, omitting part of that last day), so here’s an update on GOAJ. (I might have stopped doing these, but the GOAJ download numbers are still astonishing, so…):

  • Paperbacks: No change. Two copies of GOAJ itself sold. So far, none of the others.
  • Dataset: 8 more views, 1,067 total views; 410 total downloads.
  • GOAJ:  45 total Lulu copies, 2,741 more (total 21,330* copies from my site: total 21,375. Actual number of human downloads probably around 500 for April.
  • Subjects: 20 total Lulu copies, 58 additional, 433* other copies, 453* total.
  • Countries: 8 total Lulu copies, 206 additional, 1,793* total other copies, 1,801 total.
  • C&I: New totals 1,463* copies of the excerpted GOAJ version (16.5) and 4,259* copies of “APCLand and OAWorld” (16.4.)

*Missing downloads from 11/13-11/30/16 and, for C&I, 11/13-12/15/16.

Gray OA

Gray OA 2011-2016 (Cites & Insights 17.1) shows a total of 1,263 downloads to date, and no apparent recognition anywhere else that the Shen/Bjork “predatory articles” numbers are demonstrated to be so dramatically wrong; the dataset shows 258 views and 68 downloads.

Notes on comments

Friday, April 7th, 2017
  1. By default, comments are off (quite a few posts don’t really need commenting, and every post draws robospam). I don’t always remember to turn them on in cases where feedback is desirable.
  2. The spam software I formerly used allowed me to review all the spam, which I did. That software isn’t compatible with current WordPress. The software I’m using now does not show me spam, so it’s difficult to rescue a comment.
  3. The solution in both cases: send me email (, and if the comment is supposed to be attached to a post, say so: I’ll do that as appropriate.

Cites & Insights 17:3 (April 2017) available

Thursday, April 6th, 2017

Cites & Insights 17:3 (April 2017) is now available for downloading at

The 32-page 6″x9″ single-column issue* includes two essays:

The Art of the Beall   pp. 1-20

[Hat-tip to Phil Davis for the title.] The blacklists have “disappeared,” but not the blather. Almost entirely material from January 16, 2017 to April 3, 2017. And remember that a comprehensive study of journals that were on the lists and their article counts from 2012 through June 30, 2016 is available as C&I 17.1.

Libraries and Communities  pp. 21-32

If the first essay’s all recent material, this one’s not: items date from October 2009 to May 2014. Some thoughts on libraries and/in their communities, mostly by people better qualified to write about these things than I am

*Reminder: Cites & Insights is now optimized for online/tablet reading. If you’re printing it out, I recommend having your PDF software print as a booklet, which should require 8 sheets of paper. Very slightly smaller type, good paper efficiency.

The Problems with Shen/Björk’s “420,000”

Monday, April 3rd, 2017

[This is Chapter 4 of Gray OA 2012-016, a comprehensive study of journals on “the lists.”]


Cenyu Shen and Bo-Christer Björk published “‘Predatory’ open access: a longitudinal study of article volumes and market characteristics” in BMC Medicine 13, October 2015. (I’m bemused at the idea that this is a medical paper, but that’s a separate discussion.) I started questioning the paper’s conclusions as soon as it appeared, and continued to do so in my blog and in Cites & Insights.

Quite apart from the apparent assumption that Beall’s word is gospel when it comes to journals being “predatory”—an assumption I found, and find, appalling—I thought the numbers were implausible. The authors used a sample of 613 journals to assert that there were around 8,000 active “predatory” journals in 2014 and that those journals published around 420,000 articles in 2014 (up from around 310,000 in 2013 and 212,000 in 2012).

Being presented with a case for the implausibility of the numbers, the authors responded that the article was peer-reviewed and used proper statistical methods. As I was writing this, I took the time to read open reviewer comments on the article and the authors’ responses. Notably, all of the reviewers said they weren’t qualified to review the statistics—and there were certainly questions raised about the assumption that to be on Beall’s list was to be predatory.

The authors are right about one thing: looking at all the journals is a ridiculously large task. But that task showed that gray journals are just as heterogeneous as I thought they were, making it easy for a 6% sample to be wildly off base.

The First Cut

Now that I’ve done the work, the first note could be that the article’s 2014 figure has the first two digits reversed: it’s closer to 240,000 than to 420,000. Of course, the authors did not accidentally transpose digits; they came up with too-large results. Instead of 420,000 for 2014, 310,000 for 2013 and 212,000 for 2012, the figures should be 255,000 for 2014, 189,000 for 2013 and 125,000 for 2012 (rounding to the nearest thousand)—consistently between 59% and 61% of the article’s figures.

“255,000 questionable as compared to 560,000 DOAJ” isn’t as astonishing as “nearly as many predatory as not.” That 420,000 figure has been cited a lot, mostly by critics of open access in general.

But there’s more to say…

The Second Cut

The authors were working from an earlier and much smaller pair of Beall lists than those that I worked from. I used the Wayback Machine to download versions of the list as close as possible to the versions they used (in both cases, later and presumably a little larger). Flagging publisher and journal listings from those earlier versions yield the figures in Table 4.1, including “UA” journals but excluding X-coded ones.












Table 4.1. Journals and articles based on Beall lists at time of Shen/ Björk article

Now we’re down from 8,000 active journals to 2,692—and from 420,000 articles to just under 114,000. The percentages are still clustered: now the real numbers are 26% to 28% of those reported in the article. Even if you added 50% to my figures to account for a few dozen not-fully-counted journals (rather than the 5% to 10% I consider plausible), you’d be nowhere near 200,000, let alone 420,000. And, of course, 114,000 is a pretty small fraction of 560,000—just over one-fifth.

Even those numbers involve the odd assumption that Beall’s tagging is definitive. What happens if we reduce the universe to those articles and publishers where Beall’s actually made a case?

The Final Cut

2014 2013 20*12








Table 4.2. Journals and articles where Beall made a case

Table 4.2 shows the results: fewer than 30,000 articles in 2014—about 7% of the article’s estimate. (The 2012 and 2013 figures are 6% to 7% of the article’s estimates.) These are cases where Beall not only listed a publisher or journal at the time the authors downloaded the lists, but actually made a case for the journals or publishers being questionable or “predatory.”

Those numbers are too low—but they’re arguably what should have emerged from the study. As noted in Chapter 3, I believe realistic numbers are on the order of 120,000 for 2014; 90,000 for 2013; and 56,000 for 2012—still a lot of articles appearing in questionable journals, but not quite so alarmingly high.

What Went Wrong?

How could these two scholars be so far off? First there’s the assertion that all journals on Beall’s lists are actually predatory. Second, the “stratified” random sampling method involves some tricky assumptions, based on a “suspicion” that was “verified” by sampling all of ten journals—the suspicion “that journals from small publishers often publish a much higher number of articles than those of large publishers.”

The sampling used in this study yielded a much lower percentage of empty journals than my 100% survey. The article estimates that 67% of listings represent active journals; my 100% survey (admittedly of a larger list) shows 40% active journals. That’s an enormous difference: instead of 8,000 active journals from the smaller list, you wind up with around 4,800. That’s probably about right (I show 5,988—but that’s from a much larger list).

Beyond that, it appears that the sheer heterogeneity of journals makes projection from a small sample so dicey as to be useless. Unfortunately, I believe that to be the case.