First things first: If you’re in a position to help resolve some of the very large number of journals with malware (787) or ones that were unreachable or unworkable (752), there’s a spreadsheet with the key information for all of them here:
https://docs.google.com/spreadsheets/d/19gXpn3kVn-R33uDdOUSHgssPaLO5CEPRWUGPvDB9az0/edit?usp=sharing
And I won’t do the final piece of the multistep “second pass” until at least May 15. Help from folks with colleagues in Indonesian or Brazilian academia most helpful. (The spreadsheet, g6x, is sorted first by code, then by country, then by publisher, then by journal. The second page lists the codes and notes.)
Here’s where things stand. 15,666 journals had 1,018,364 articles, up from 890,069 2019 articles. The 2020 number will rise somewhat, both because some journals are late to publish issues but also because the numbers don’t include *any* of the malware and unreachable journals (but 2019 numbers do). For GOA5, the 2019 total was 854,018 articles.
The 15,666 (yes, I know, I say 15,667 sometimes–it’s hard to remember to subtract one for the row of labels) include:
- 13,391 “a”–regular–journals
- 317 “bi”–no articles in 2019 or 2020, mostly ceased, renamed, changed publishers or otherwise disappeared
- 3 bm–early cases of journals with malware that could be reached through other addesses
- 343 bx–journals available at a different URL than the one in DOAJ. There will probably be quite a few more of these; nearly all at present are either Sciendo (from DeGruyter) or dergipark, moved from .gov to .org without generally changing DOAJ records.
- 58 xd: journals with no articles later than 2014, most of them “duplicates” that have been superseded.
- 787 xm: Malware
- 14 xn: Apparently not OA.
- 1 xt: A website I couldn’t translate or make enough sense of to count
- 752 xx: Unreachable (404, etc.) or unworkable (db errors, etc.)
- So far, I see 4,371 journals with fees, 9,706 with no fees, and a few hundred needing rechecking (mostly newly-added journals that are xm or xx).
Now, after ignoring journals for a day or two, I’ll recheck 2,211 journals for added issues/articles and 1,613 to try to clear malware and unreachable cases. (The 2,211 includes 946 cases marked along the way and 1,265 where there were at least 1.5 times as many articles in 2019 as in 2020–the original version of this paragraph had incorrect numbers here; fortunately, the correction means fewer to check.)
As already noted, the final malware pass will start no earlier than May 15. If all goes well, the primary book and spreadsheet should be ready in very late June or early July.