Archive for April, 2021

GOA6: First pass completed

Wednesday, April 21st, 2021

First things first: If you’re in a position to help resolve some of the very large number of journals with malware (787) or ones that were unreachable or unworkable (752), there’s a spreadsheet with the key information for all of them here:

https://docs.google.com/spreadsheets/d/19gXpn3kVn-R33uDdOUSHgssPaLO5CEPRWUGPvDB9az0/edit?usp=sharing

And I won’t do the final piece of the multistep “second pass” until at least May 15. Help from folks with colleagues in Indonesian or Brazilian academia most helpful. (The spreadsheet, g6x, is sorted first by code, then by country, then by publisher, then by journal. The second page lists the codes and notes.)

Here’s where things stand. 15,666 journals had 1,018,364 articles, up from 890,069 2019 articles. The 2020 number will rise somewhat, both because some journals are late to publish issues but also because the numbers don’t include *any* of the malware and unreachable journals (but 2019 numbers do). For GOA5, the 2019 total was 854,018 articles.

The 15,666 (yes, I know, I say 15,667 sometimes–it’s hard to remember to subtract one for the row of labels) include:

  • 13,391 “a”–regular–journals
  • 317 “bi”–no articles in 2019 or 2020, mostly ceased, renamed, changed publishers or otherwise disappeared
  • 3 bm–early cases of journals with malware that could be reached through other addesses
  • 343 bx–journals available at a different URL than the one in DOAJ. There will probably be quite a few more of these; nearly all at present are either Sciendo (from DeGruyter) or dergipark, moved from .gov to .org without generally changing DOAJ records.
  • 58 xd: journals with no articles later than 2014, most of them “duplicates” that have been superseded.
  • 787 xm: Malware
  • 14 xn: Apparently not OA.
  • 1 xt: A website I couldn’t translate or make enough sense of to count
  • 752 xx: Unreachable (404, etc.) or unworkable (db errors, etc.)
  • So far, I see 4,371 journals with fees, 9,706 with no fees, and a few hundred needing rechecking (mostly newly-added journals that are xm or xx).

Now, after ignoring journals for a day or two, I’ll recheck 2,211 journals for added issues/articles and 1,613 to try to clear malware and unreachable cases. (The 2,211 includes 946 cases marked along the way and 1,265 where there were at least 1.5 times as many articles in 2019 as in 2020–the original version of this paragraph had incorrect numbers here; fortunately, the correction means fewer to check.)

As already noted, the final malware pass will start no earlier than May 15. If all goes well, the primary book and spreadsheet should be ready in very late June or early July.

GOA6: Ninth Update

Saturday, April 10th, 2021


Time for another GOA6 checkpoint, at 14,400 of 15,676.

Note that, as always, I sort journals by publisher before checking–because many multijournal publishers use the same templates for all journals, making it easier for me to find fee data and do article counts.

For GOA6, that means I’ve now checked partway through the University of Isfahan. So far, the 2020 article count is 942.685, and that will go up. The 2019 total for this set of journals is 830,018 articles.

Last year, that range of publishers included 12,835 journals, which published 792,068 articles in 2019. So there’s a net gain of 1,562 added journals so far. A million overall articles still seems likely, but not certain.

For this group of 1,600 journals–ignoring the first 12,600–problematic journals include 308 malware case and 75 or so unreachable/unworkable. Yes, that’s a terribly high malware ratio.

Looking more closely at the malware cases for these 1,600 journals, there are ten security-certificate problem, seven ransomware, ten malware, 24 phishing and 256 Trojans.

The problem is mostly Indonesia: 842 of the 1,600 journals in this group are from Indonesia, and 281 of those have malware, mostly at the root URL for a university’s set of journals.

I checked all 14,400 journals scanned so far. Of 764 total malware cases, 481 are from Indonesia. Brazil is a distant second at 121, with smaller clusters from Romania and Spain (and a few cases elsewhere). Yes, Indonesia has more DOAJ-listed journals than any other country, but 481 of Indonesia’s 1,745 (so far) are problematic; Brazil has the second-most journals, and 121 of 1,578 are problematic. (All these figures exclude the remaining 1,276 journals–but only 26 of those are from Indonesia.)

I believe attempts have been made to alert publishers to malware problems. Some may be again this year. This is a continuing problem.

I’d say it’s now nearly certain that the first scan will be done in late April, barring illness or other unexpected events. That would leave some checking and the long rescans. (So far, about 2,200 journals need rechecking; the final number will probably exceed 2,300. Rechecking can be a slow process.)

So no overall target date yet…