I just finished the final data-gathering pass–rechecking all problematic journals. I was able to clear 105 additional journals and found 8 more that either had no post-2014 activity or were no longer in DOAJ. That left 639 xm journals and 91 xx journals. I’ve concluded that defective journals that are still defective after six checks over two years, or that had no post-2018 article counts, should not be included in the overall analysis. In all, 492 journals were excluded, leaving 260 malware and 74 unavailable/unworkable journals, all with article counts from DOAJ, retained for the analysis.
The big numbers: 15,130 fully analyzed journals, 69.7% of them without fees (“diamond” if you like). 14,175 of those showed 2020 articles (68.8% no-fee), for a total of 1,061,256 articles. The bad news: while the percentage of no-fee journals has stayed about constant at nearly 70%, the percentage of no-fee articles has fallen significantly, from 39% for 2019 articles (in GOA5) to 35.5% for 2020: in essence, nearly all the 2020 growth was in fee-charging journals. NOTE: These numbers may change slightly during additional checking–e.g., two journals have moved to the Excluded category, because neither had any post-2014 articles.
If you’re wondering: excluding all xm and xx journals would reduce the 2020 article count by 5,268 articles (but, of course, I only have article counts where journals were reporting them to DOAJ)–and including all xm and xx journals would increase the article count by 4,315 (same caveat applies). In other words, these decisions have almost no impact overall.
I finished the data analysis on the same day as I did last year, despite having more than 1,000 additional journals: that speaks to fewer health and other interruptions, perhaps cleverer counting techniques, and perhaps fewer journals making it really hard to count articles. (Although some do try–including one where literally the only way to find dates is to read the articles and look for the recommended citation form! )
Next: add derived data, move columns around, and start the data processing and book writing. Anticipated completion date: somewhere around June 24-July 4, perhaps 2-3 weeks later for the country book.
I haven’t done a usage report for GOA4 and GOA5 for a while, and now that there’s a hosted copy of the GOA5 dataset with a dashboard elsewhere. I’m not sure how useful they are. I do know that book copies have declined considerably, from over 4,000 for GOA4 to around 740 for GOA5 (including two printed books). I find that discouraging, especially since the book includes caveats that aren’t in the dataset.
Meantime, one with the show. I surely hope we don’t hit “double 70” next year — with 70% of serious OA journals “diamond” but 70% of the articles appearing in the fee-charging journals….