I’ve finished the smallest but most annoying (and slowest) part of data gathering: rechecking the xm2/xx2 journals. It wasn’t entirely worthless–53 were now a (normal), five were now inactive (bi), one was now xj (no longer in DOAJ), and one was now dead/defunct (xd).
Unfortunately, the malware problem is much worse than last year: 335 xm2 (compared to 260 last year) and, sigh, 313 xm (compared to 190 last year. It’s mostly Indonesia: almost precisely 2/3 of all malware cases (439 out of 648), including 157 of 313 new cases and 282 of 335 continuing (xm2). Next highest is Brazil with 42 new and five continuing, then Italy, Russia and Spain with a dozen each and Venezuela with 11. Nearly all Indonesia cases are in universities: either they’re not aware that they’re carrying malware (almost always in entry menus for Oa journals, not individual journals) or they just don’t care.
DOAJ has also been more actively removing journals this year than in the first four months of 2023: 249 journals compared to 41 last year. And there were more “not OA journals”–but only a dozen, mostly because when checked twice, they required login. (There are also two or three that are encyclopedias, not journals, and one that I find entirely mystifying–a journal devoted to one author that has neither dates nor issues but lists articles alphabetically; I’m unwilling to read each and every PDF to see what the dates are, if there are any.)
How often did I find a new URL (by searching on journal title + ISSN) that appeared to work properly as an entry point for the journal? 359, I believe.
The Big Numbers
In all, and barring cleanup during processing, 19,622 journals are fully analyzed, although 230 of those have no articles later than 2018 (xd). Those journals published 1,440,494 articles in 2023 and 1,445,733 in 2022–up just basely from last year and the first time there’s been a decline in overall articles (probably because a number of journals are very late with issue processing). NOTE: I’ve now moved xd journals to Excluded, where they belong. Doesn’t affect the article numbers, but does cut the overall number of fully-analyzed journals to 19,392.
18,430 journals are “normal” (a) and 510 had no 2022 or 2023 articles when checked (bi). The former is around 850 more journals than last year; the latter, 87 more.
Special Cases
Cases that will appear in regular tables:
- bi (inactive): 510
- xm (malware or certificate problems): 313
- xx (unavailable or unworkable): 139
Cases that will not be in regular tables and will be on the Exclusions page of the dataset:
- xd (dead/defunct): 230
- xj (removed from DOAJ): 249
- xm2 (continuing malware): 335
- xn (not an OA journal or uncountable): 12
- xx2 (continued unavailable/unworkable): 51
Next steps
It appears that I will make a preliminary version of the dataset available on my own website, probably on Thursday or Friday. It’s possible–but unlikely–that there will be slight changes in the final dataset, published when Gold Open Access 2024 is published.
Preliminary data now available
The preliminary data is now available at https://waltcrawford.name/g9_prefig.xlsx. I believe this dataset will be identical to the final g9 dataset on figshare, unless some surprises turn up during preparation of derivative data and the book.
Do be aware that Malwarebytes seems to think that all .name domains are potentially problematic. I’ve notified them that this isn’t so, and just as they fixed it previously I hope they’ll do so again–I’m not prepared to rebuild all of my files on another domain.
And beyond…
Now I’ll check data for conformity, add derived data (e.g., revenue figures for journals, size, growth…) and make sure the templates are working.
Then comes the “writing”–mostly generating tables and figures, but with some text as well. [Modified to allow for other priorities:] The book and formal data posting should be ready in the spring (that is, before the end of June).
Then, Diamond OA 2024, covering the 13,139 diamond OA journals that aren’t exclusions (almost precisely 2/3, but exactly 66.9605545%), mostly focusing on country-by-country. Figure another three to five weeks for that.