What would happen if I replicated the sampling techniques actually used in the study (to the extent that I understand the article)?
I couldn’t precisely replicate the sampling. My working dataset had already been stripped of several thousand “journals” and quite a few “publishers,” and I took Beall’s lists a few months before Shen/Björk did. (In the end, the number of journals and “journals” in their study was less than 20% larger than in my earlier analysis, although there’s no way of knowing how many of those journals and “jour*nals” actually published anything. In any case, if the Shen/Björk numbers had been 20% or 25% larger than mine, I would have said “sounds reasonable” and let it go at that.)
For each tier in the Shen/Björk article, I took two samples, both using random techniques, and for all but Tier 4, I used two projection techniques—one based on the number of active true gold OA journals in the tier, one based on all journals in the tier. (For Tier 4, singleton journals, there’s not enough difference between the two to matter much.) In each tier, I used a sample size and technique that followed the description in the Shen/Björk article.
The results were interesting. Extreme differences between the lowest sample and the highest sample include 2014 article counts for Tier 2 (publishers with 10 to 99 journals), the largest group of journals and articles, where the high sample was 97,856 and the low—actually, in this case, the actual counted figure—was 46,770: that’s a 2.09 to 1 range. There’s also maximum revenue, where the high sample for Tier 2 was $30,327,882 while the low sample (once again the counted figure) was $9,574,648: a 3.17 to 1 range—in other words, a range wide enough to explain the difference between my figures and the Shen/Björk figures purely on the basis of sample deviation. (It could be worse: the 2013 projected revenue figures for Tier 2 range from a high of $41,630,771 to a low of $8,644,820, a range of 4.82 to 1! In this case, the actual sum was just a bit higher than the low sample, at $8,797,861.)
Once you add the tiers together, the extremes narrow somewhat. Table 7 shows the low, actual, and high total article projections, noting that the 2013, 2012, and 2011 low and high might not be the actual extremes (I took the lowest and highest 2014 figures for each tier, using the other figures from that sample.) It’s still a broad range for each year, but not quite as broad. (The actual numbers are higher than in earlier tables largely because journals in DOAJ had not been excluded at the time this dataset was captured.)
Table 7. Article projections by year, stratified sample
The range for 2014 is 1.54 to 1: broad, but narrower than in the first two attempts. On the other hand, the range for maximum revenues is larger than in the first two attempts: 2.18 to 1 for 2014 and a very broad 2.46 to 1 for 2013, as in Table 8.
Table 8. Maximum revenue projections, stratified sample
Note that the high figures here are pretty close to those offered by Shen/Björk, whereas the high mark for projected article count is still less than half that suggested by Shen/Björk. (Note also that in Table 7, the actual counts for 2013 and 2012 are actually lower than the lowest combined samples!)
For the graphically inclined, Figure 4 shows the low, actual and high projections for the third sample. This graph is not comparable to the earlier ones, since the horizontal axis is years rather than samples.
Figure 4. Estimated article counts by year, stratified
It’s probably worth noting that, even after removing thousands of “journals” and quite a few publishers in earlier steps, it’s still the case that only 57% of the apparent journals were actual, active gold OA journals—a percentage ranging from 55% for Tier 1 publishers to 61% for Tier 3.
It does appear that, for projected articles, the stratified sampling methodology used by Shen/Björk may work better than using a pure random sample across all journals—but for projected revenues, it’s considerably worse.
This attempt could answer the revenue discrepancy, which in any case is a much smaller discrepancy (as noted, my average APC per article is considerably higher than Shen/Björk’s)—but it doesn’t fully explain the huge difference in article counts.
I do not doubt that Shen/Björk followed sound statistical methodologies, which is quite different than agreeing that the Beall lists make a proper subject for study. The article didn’t identify the number of worthless articles or the amount spent on them; it attempted to identify the number of articles published by publishers Beall disapproved of in late summer 2014, which is an entirely different matter.
That set aside, how did the Shen/Björk sampling and my nearly-complete survey wind up so far apart? I see four likely reasons:
While Shen/Björk accounted for empty journals (but didn’t encounter as many as I did), they did not control for journals that have articles but are not gold OA journals. That makes a significant difference.
Sampling is not the same as counting, and the more heterogeneous the universe, the more that’s true. That explains most of the differences, I believe (on the revenue side, it can explain all of them).
The first two reasons, enhanced by two or three months’ of additional listings, combined to yield a much higher estimate of active journals than my survey: more than twice as many.
The second reason resulted in a much higher average number of articles per journal than in my survey (53 as compared to 36), which, combined with the doubled number of journals, neatly explains the huge difference in article counts.
The net result is that, while Shen/Björk carried out a plausible sampling project, the final numbers raise needless alarm about the extent of “bad” articles. Even if we accept that all articles in these projections are somehow defective, which I do not, the total of such articles in 2014 appears to be considerably less than one-third of the number of articles published in serious gold OA journals (that is, those in DOAJ)—not the “nearly as many” the study might lead one to assume.
No, I do not plan to do a followup survey of publishers and journals in the Beall lists. It’s tempting in some ways, but it’s not a good use of my time (or anybody else’s time, I suggest). A much better investigation of the lists would focus on three more fundamental issues:
Is each publisher on the primary list so fundamentally flawed that every journal in its list should be regarded as ppppredatory?
Is each journal on the standalone-journal list actually ppppredatory?
In both cases, has Beall made a clear and cogent case for such labeling?
The first two issues are far beyond my ken; as to th first, there’s a huge difference between a publisher having some bad journals and it making sense to dismiss all of that publisher’s journals. (See my longer PPPPredatory piece for a discussion of that.)
Then there’s that final bullet…
[In closing: for this and the last three posts—yes, including the Gunslingers one—may I once again say how nice Word’s post-to-blog feature is:? It’s a template in Word 2013, but it works the same way, and works very well.]