On October 1, 2015 (yesterday, that is), I posted “The Gold OA Landscape 2011-2014: malware and some side notes,” including this paragraph:
Second, a sad note. An article–which I’d seen from two sources before publication–that starts by apparently assuming Beall’s lists are something other than junk, then bases an investigation on sampling from the lists, has appeared in a reputable OA journal and, of course, is being picked up all over the place…with Beall being quoted, naturally, thus making the situation worse. I was asked for comments by another reporter (haven’t seen whether the piece has appeared and whether I’m quoted), and the core of my comments was that it’s hard to build good research based on junk, and I regard Beall’s lists as junk, especially given his repeated condemnation of all OA–and, curiously, his apparent continuing belief that author-side charges, which in the Bealliverse automatically corrupt scholarship, only happen in OA (page charges are apparently mythical creatures in the Bealliverse). So, Beall gains even more credibility; challenging him becomes even more hopeless.
When I’d looked at the article, twice, I’d had lots of questions about the usefulness of extrapolating article volumes and, indeed active journal numbers from a rather small sampling of journals within an extremely heterogeneous space–but, glancing back at my own detailed analysis of journals in those lists (which, unlike the article, was a full survey, not a sampling), I was coming up with article volumes that, while lower, were somewhere within the same ballpark (although the number of active journals was less than half that estimated in the article. (The article is “‘Predatory’ open access: a longitudinal study of article volumes and market characteristics” by Cenyu Shen and Bo-Christer Björk; it’s just been published.)
Basically, the article extrapolated 8,000 active “predatory” journals publishing around 420,000 articles in 2014, based on a sampling of fewer than 700 journals. And, while I showed only 3,876 journals (I won’t call them “predatory” but they were in the junk lists) active at some point between 2011 and June 2014, I did come up with a total volume of 323,491 articles–so I was focusing my criticism of the article on the impossibility of basing good science on junk foundations.
Now, go back and note the italicized word two paragraphs above: “glancing.” Thanks to an email exchange with Lars Bjørnshauge at DOAJ, I went back and read my own article more carefully–that is, actually reading the text, not just glancing at the figures. Turns out 323,491 is the total volume of articles for 3.5 years (2011 through June 30, 2014). The annual total for 2013 was 115,698; the total for the first half of 2014 was 67,647, so it’s fair to extrapolate that the 2014 annual total would be under 150,000.
That’s a huge difference: not only is the article’s active-journal total more than twice as high as my own (non-extrapolated, based on a full survey) number, the article total is nearly three times as high. That shouldn’t be surprising: the article is based on extrapolations from a small number of journals in an extremely heterogeneous universe, and all the statistical formulae in the world don’t make that level of extrapolation reliable.
Shen and Björk ignored my work, either because it’s not Properly Published or because they weren’t aware of it (although I’m pretty sure Björk knows of my work). They say “It would have taken a lot of effort to manually collect publication volumes” for all the journals on the list. That’s true: it was a lot of effort. Effort which I carried out. Effort which results in dramatically lower counts for the number of active journals and articles.
(As to the article’s “geographical spread of articles,” that’s based on a sample of 205 articles out of what they seem to think are about 420,000. But I didn’t look at authors so I won’t comment on this aspect.)
I should note that “active” journals includes those that published at least one article any time during the period. Since I did my analysis in late 2014 and cut off article data at June 30, 2014, it’s not surprising that the “active this year” count is lower for 2014 (3,014 journals) than for 2013 (3,282)–and I’ll agree with the article that recent growth in these journals has been aggressive: the count of active journals was 2,084 for 2012 and 1,450 for 2011.
I could speculate as to whether what I regard as seriously faulty extrapolations based on a junk foundation will get more or less publicity, citations, and credibility than counts based on a full survey–but carried out by an independent researcher using wholly transparent methodology and not published in a peer-reviewed journal. I know how I’d bet. I’d like to hope I’m wrong. (If not being peer-reviewed is a fatal problem, then a big issue in the study goes away: the junk lists are, of course, not at all peer reviewed.)