Careful reading and questionable extrapolation

On October 1, 2015 (yesterday, that is), I posted “The Gold OA Landscape 2011-2014: malware and some side notes,” including this paragraph:

Second, a sad note. An article–which I’d seen from two sources before publication–that starts by apparently assuming Beall’s lists are something other than junk, then bases an investigation on sampling from the lists, has appeared in a reputable OA journal and, of course, is being picked up all over the place…with Beall being quoted, naturally, thus making the situation worse. I was asked for comments by another reporter (haven’t seen whether the piece has appeared and whether I’m quoted), and the core of my comments was that it’s hard to build good research based on junk, and I regard Beall’s lists as junk, especially given his repeated condemnation of all OA–and, curiously, his apparent continuing belief that author-side charges, which in the Bealliverse automatically corrupt scholarship, only happen in OA (page charges are apparently mythical creatures in the Bealliverse). So, Beall gains even more credibility; challenging him becomes even more hopeless.

When I’d looked at the article, twice, I’d had lots of questions about the usefulness of extrapolating article volumes and, indeed active journal numbers from a rather small sampling of journals within an extremely heterogeneous space–but, glancing back at my own detailed analysis of journals in those lists (which, unlike the article, was a full survey, not a sampling), I was coming up with article volumes that, while lower, were somewhere within the same ballpark (although the number of active journals was less than half that estimated in the article. (The article is “‘Predatory’ open access: a longitudinal study of article volumes and market characteristics” by Cenyu Shen and Bo-Christer Björk; it’s just been published.)

Basically, the article extrapolated 8,000 active “predatory” journals publishing around 420,000 articles in 2014, based on a sampling of fewer than 700 journals. And, while I showed only 3,876 journals (I won’t call them “predatory” but they were in the junk lists) active at some point between 2011 and June 2014, I did come up with a total volume of 323,491 articles–so I was focusing my criticism of the article on the impossibility of basing good science on junk foundations.

Now, go back and note the italicized word two paragraphs above: “glancing.” Thanks to an email exchange with Lars Bjørnshauge at DOAJ, I went back and read my own article more carefully–that is, actually reading the text, not just glancing at the figures. Turns out 323,491 is the total volume of articles for 3.5 years (2011 through June 30, 2014). The annual total for 2013 was 115,698; the total for the first half of 2014 was 67,647, so it’s fair to extrapolate that the 2014 annual total would be under 150,000.

That’s a huge difference: not only is the article’s active-journal total more than twice as high as my own (non-extrapolated, based on a full survey) number, the article total is nearly three times as high. That shouldn’t be surprising: the article is based on extrapolations from a small number of journals in an extremely heterogeneous universe, and all the statistical formulae in the world don’t make that level of extrapolation reliable.

Shen and Björk ignored my work, either because it’s not Properly Published or because they weren’t aware of it (although I’m pretty sure Björk knows of my work). They say “It would have taken a lot of effort to manually collect publication volumes” for all the journals on the list. That’s true: it was a lot of effort. Effort which I carried out. Effort which results in dramatically lower counts for the number of active journals and articles.

(As to the article’s “geographical spread of articles,” that’s based on a sample of 205 articles out of what they seem to think are about 420,000. But I didn’t look at authors so I won’t comment on this aspect.)

I should note that “active” journals includes those that published at least one article any time during the period. Since I did my analysis in late 2014 and cut off article data at June 30, 2014, it’s not surprising that the “active this year” count is lower for 2014 (3,014 journals) than for 2013 (3,282)–and I’ll agree with the article that recent growth in these journals has been aggressive: the count of active journals was 2,084 for 2012 and 1,450 for 2011.

I could speculate as to whether what I regard as seriously faulty extrapolations based on a junk foundation will get more or less publicity, citations, and credibility than counts based on a full survey–but carried out by an independent researcher using wholly transparent methodology and not published in a peer-reviewed journal. I know how I’d bet. I’d like to hope I’m wrong. (If not being peer-reviewed is a fatal problem, then a big issue in the study goes away: the junk lists are, of course, not at all peer reviewed.)



4 Responses to “Careful reading and questionable extrapolation”

  1. Obinna Ojemeni says:

    Dear Professor Walt Crawford,

    i would want to say that your work on Beall’s list has been insightful for my proposed study at the doctoral level because i identified bias to towards publishing efforts from Africa by Beall. When he placed HINDAWI on watchlist in 2012, listed as Predatory in 2013 and finally delisted in 2014 after an appeal process. Your work has enabled me to just focus on those OA journals that are faulted for unethical practices instead of relying on solely Beall’s list.

    As regards the geographical spread of authors as reported by Shen & Bjork (2015), i found it a bit sloppy that it concluded with just only 205 articles where there exists about 420,000 articles in total. For the study done by Nwagwu & Ojemeni (2015), using a sample of 5,601 biomedical articles publised by two Nigeria OA pubishers regarded to be predatory seems to be fair but was not captured in their study as well.

  2. Walt Crawford says:

    Obinna Ojemeni: It’s just Walt, not Professor. Beyond that:

    *I don’t believe there are 420,000 articles from journals on that list in 2014; I believe there are about 150,000.

    *Even among 150,000, as you say, 205 articles is such a tiny sample that it’s hard to give much credence to the results.

    I focus most of my work on serious OA journals (those listed in DOAJ); there are 30 such journals in Nigeria, although it’s notable (and unfortunate) that all but two of them charge article processing charges. That’s the lowest percentage of free OA journals of any country with at least four DOAJ-listed journals.

  3. Obinna Ojemeni says:


    Thanks for pointing that out for me, it was simply courtesy. I was criticised by someone that reviewed my PhD pre-proposal for ignoring your works, so I have been able to consult it to the best of my ability. I believe it brings a balance to the OA discourse.

    Well, just like you observed in DOAJ that about 30 OA journals are published in Nigeria. But I have been using the Directory for my seminar paper where I identified 36 OA journals while I saw in an earlier study stating 39. I wonder if the remaining 3 were delisted and all of them are yet to get the new approval seal by DOAJ.

    However, my interest in DOAJ is just to identify the extent to which Nigerian universities adopt and utilize OA initiatives. Since little or no research would take place in the country without OAJ, due to lack of subscription to Toll Access journals in most, if not all universities.

  4. Walt Crawford says:

    Thanks for the additional comment.

    DOAJ does have a list showing which journals have been de-listed. (And “about 30” is just that: I didn’t go back and count the ones that didn’t fit into my grades A&B.)

    Good luck with your work as described in your final paragraph. Much more of this is likely to be needed.