GOA9: Progress Report

May 16th, 2024

I’m reworking on Gold Open Access 2024, and my best guess is that it will be complete some time this month–most likely between May 24 and May 31. “Complete” means that the paperback is available from Lulu, the free PDF is available on my website, and the dataset is on Figshare.

A best guess is just that.

If you’ve been using the preliminary dataset: there will be changes–around 30 additional rows, and one odd change that does not affect article counts.

Best guess for Diamond OA 2024, barring unusual crises, is three to five weeks after Gold Open Access 2024 is complete–maybe more, almost certainly not less. So very late June or early July, if crossing my fingers doesn’t make typing too slow.

GOA9: Preliminary dataset now available

May 2nd, 2024

A preliminary version of the GOA9 dataset is now available at at https://waltcrawford.name/g9_prefig.xlsx. I believe this dataset will be identical to the final g9 dataset on Figshare, unless some surprises turn up during preparation of derivative data and the book.

The dataset proper includes 19,392 journals; the exclusions page has another 877 (including the “xd” journals). For more notes, see https://walt.lishost.org/2024/04/goa9-end-of-data-gathering/

[Note: crosschecking to prepare derivative data showed two keyboarding errors; they have been corrected and the preliminary dataset has been replaced.]

Do note that Malwarebytes seems to have restored its blanket categorization of all .name domains as suspicious…and I have no idea when or whether they’ll (once again) refine that overbroad warning.

GOA9: End of data gathering

April 30th, 2024

I’ve finished the smallest but most annoying (and slowest) part of data gathering: rechecking the xm2/xx2 journals. It wasn’t entirely worthless–53 were now a (normal), five were now inactive (bi), one was now xj (no longer in DOAJ), and one was now dead/defunct (xd).

Unfortunately, the malware problem is much worse than last year: 335 xm2 (compared to 260 last year) and, sigh, 313 xm (compared to 190 last year. It’s mostly Indonesia: almost precisely 2/3 of all malware cases (439 out of 648), including 157 of 313 new cases and 282 of 335 continuing (xm2). Next highest is Brazil with 42 new and five continuing, then Italy, Russia and Spain with a dozen each and Venezuela with 11. Nearly all Indonesia cases are in universities: either they’re not aware that they’re carrying malware (almost always in entry menus for Oa journals, not individual journals) or they just don’t care.

DOAJ has also been more actively removing journals this year than in the first four months of 2023: 249 journals compared to 41 last year. And there were more “not OA journals”–but only a dozen, mostly because when checked twice, they required login. (There are also two or three that are encyclopedias, not journals, and one that I find entirely mystifying–a journal devoted to one author that has neither dates nor issues but lists articles alphabetically; I’m unwilling to read each and every PDF to see what the dates are, if there are any.)

How often did I find a new URL (by searching on journal title + ISSN) that appeared to work properly as an entry point for the journal? 359, I believe.

The Big Numbers

In all, and barring cleanup during processing, 19,622 journals are fully analyzed, although 230 of those have no articles later than 2018 (xd). Those journals published 1,440,494 articles in 2023 and 1,445,733 in 2022–up just basely from last year and the first time there’s been a decline in overall articles (probably because a number of journals are very late with issue processing). NOTE: I’ve now moved xd journals to Excluded, where they belong. Doesn’t affect the article numbers, but does cut the overall number of fully-analyzed journals to 19,392.

18,430 journals are “normal” (a) and 510 had no 2022 or 2023 articles when checked (bi). The former is around 850 more journals than last year; the latter, 87 more.

Special Cases

Cases that will appear in regular tables:

  • bi (inactive): 510
  • xm (malware or certificate problems): 313
  • xx (unavailable or unworkable): 139

Cases that will not be in regular tables and will be on the Exclusions page of the dataset:

  • xd (dead/defunct): 230
  • xj (removed from DOAJ): 249
  • xm2 (continuing malware): 335
  • xn (not an OA journal or uncountable): 12
  • xx2 (continued unavailable/unworkable): 51

Next steps

It appears that I will make a preliminary version of the dataset available on my own website, probably on Thursday or Friday. It’s possible–but unlikely–that there will be slight changes in the final dataset, published when Gold Open Access 2024 is published.

Preliminary data now available

The preliminary data is now available at https://waltcrawford.name/g9_prefig.xlsx. I believe this dataset will be identical to the final g9 dataset on figshare, unless some surprises turn up during preparation of derivative data and the book.

Do be aware that Malwarebytes seems to think that all .name domains are potentially problematic. I’ve notified them that this isn’t so, and just as they fixed it previously I hope they’ll do so again–I’m not prepared to rebuild all of my files on another domain.

And beyond…

Now I’ll check data for conformity, add derived data (e.g., revenue figures for journals, size, growth…) and make sure the templates are working.

Then comes the “writing”–mostly generating tables and figures, but with some text as well. [Modified to allow for other priorities:] The book and formal data posting should be ready in the spring (that is, before the end of June).

Then, Diamond OA 2024, covering the 13,139 diamond OA journals that aren’t exclusions (almost precisely 2/3, but exactly 66.9605545%), mostly focusing on country-by-country. Figure another three to five weeks for that.


GOA9: End of Pass 2 (start of Pass 2b)

April 27th, 2024

I’ve completed Pass 2, problematic journals and those where it seemed plausible that a new 2023 issue might emerge. Along the way, I also retested 35 of the xm2/xx2 journals, and–to my surprise–found five that could be resurrected looking for other URLs. So, given that there are only 456 of these, I’ll check them one more time.

At this point–omitting those 456–there are 19,563 included journals and 250 excluded journals. The 250 exclusions account for only 438 articles in 2023, which is hardly surprising (the total peaked at 8,511 in 2020). Nearly all of the exclusions so far, 239 of them, are journals removed from DOAJ since 1/1/2024; the other 11 are not OA journals–four encyclopedias and seven journals that now require login with no other obvious means of access (checked twice).

The 19,563 account for 1,441,817 articles in 2023 and 1,443,423 in 2022. 333 of them are at different URLs than the ones that appear in DOAJ. 6,470 have fees and 13,093–just over two-thirds–do not.

Special cases

  • bi [inactive: no articles since 2021]: 521
  • xd [dead: no articles since 2018]: 229
  • Note that many bi and xd have been continued under other names
  • xm [malware, but not in 2023]: 313
  • xx [unavailable or unworkable, but not in 2023]: 138

Yes, there are still far too many malware cases…especially given the xm2 still to be rechecked.

Now, on to the mercifully brief final pass. Then, some time off, and time to massage data, add derived data, and write the book(s).

Oh, and you’ve still got a few days to chime in on https://walt.lishost.org/2024/04/goa9-should-i-post-a-preliminary-dataset/

GOA9: Should I post a preliminary dataset?

April 26th, 2024

Here’s the question: Should I post a preliminary dataset for GOA9 when I finish the final data-gathering pass, rather than waiting until the book is ready (my usual practice)?

I’ll finish Pass 2 (rechecking most problematic cases that weren’t also problematic last year) this weekend. I’ve looked at enough of the “xm2/xx2” cases (ones that were problematic this year and last) to believe that it may be worth the few days required to recheck those. So the data will be ready some time next week–probably May 2-4.

At that point, after a day off, I do some normalization, then start adding derivative data (e.g., revenue for each fee-charging journal, size and cost brackets), then prepare the book–probably a four or five week process. (The Diamond OA by Country book gets done after I’ve published the book and uploaded the dataset to Figshare–probably in mid-July.)

Of course, I’d really like people to read the text treatment before using the data, and it is possible that normalization could result in some data changes (probably very few, maybe none). Ideally, I’d like a few people to buy the print book (always priced at the nearest half-dollar over actual production costs), but that seems like a lost cause…

So: If I post this, there could be some changes–and I would NOT be tracking or posting those changes.

I’ll decide around May 2, based on feedback. [The early post would add maybe an hour or two of work; that’s not a decision point.] Please leave feedback in comments, in email (waltcrawford@gmail.com), or to waltcrawford on Mastodon. (I’ve been off the deadbird for months now.)

What say you?

Gold/Diamond Scan, Toward Second Pass

April 12th, 2024

I’ve now done several cleanup things:

  • Checked to see whether any of the journals removed from DOAJ since 1/1/2024 had been restored by 4/12/2024, None had; those journals are all treated as excluded.
  • Compared codes for last year’s study with those this time around. If a journal is “xm” or “xx” and was also “xm” or “xx” last year, I stop checking it. There were a lot of these–adding those to the “xj” (removed) journals yields 948 journals that I marked as excluded and won’t recheck. Perhaps worth noting that those 948 only account for 9,757 articles in 2024 (where articles could be counted through other means) and 16,574 in 2023.
  • At this point, there are 2,510 journals to be rechecked and 17,177 complete (and not excluded, but including journals no longer publishing). Those journals account for 1,393,000 articles in 2023; 1,381,039 in 2022.
  • The rescan will start today and cover 2,510 journals. I’ll do the usual daily Mastodon posts. Hoping for two weeks…but I do some additional checking, so it could take longer.

Gold/Diamond OA Scan, End of First Pass

April 10th, 2024

In all, 20,269 journals have been checked, with (so far) a total of 1,425,237 articles in 2023 and 1,453,420 in 2022.  Just barely more than two-thirds of the journals are diamond (no fee): 13,596, as compared to 6.673 with fees.

The second pass will involve 3,073 journals, including most of those noted below (except bi and xd) and another 1,200 or so that seem likely to have published another 2023 issue since they were checked.

Special cases

  • bi [Inactive: no articles since 2021]: 494.
  • xd [Dead: no articles since 2018]: 224
  • xj {Removed from DOAJ since 1/1/24]: 119
  • xm [Malware and certificate problems]: 867
  • xn [Not OA, including–mostly–ones that now require login]: 22
  • xx [Unavailable or unworkable]: 848

Based on past experience, that last category is likely to yield the most positive results, although one can always hope that many of the xm cases will be resolved.

Next steps

After a day or two off [filing taxes and other fun stuff], I’ll save off the journals that don’t need rechecking, do some comparisons of the remainder with last year’s problems and exclusions, and start the (typically slower) second pass. With luck, cross fingers, I’ll be done by the end of April and can begin adding derived numbers and writing the books.

Gold/Diamond OA Journal Scan, Penultimate Week (Week 14)

April 7th, 2024

So far, 19,775 journals scanned (519 remain), with 1,396,763 articles in 2023, 1,425,512 in 2022. Of these journals, 6,545 have fees; 13,230 do not (diamond or platinum OA). Some 3,021 will be rechecked, including all of those below (except possibly xj) and others where it seems likely that 2023 issues have appeared since the first scan.

Barring big issues, this scan will be done this week, followed by some massaging and two or three weeks for the rescan.

Special cases

  • bi [Inactive: no articles since 2021]: 481
  • xd [Defunct: no articles since 2018]: 215
  • xj [No longer in DOAJ]: 114
  • xm [Malware and certificate issues]: 864 (noting that cases where Malwarebytes traps an outbound malware call but allows the session to continue are NOT counted as malware)
  • xn {Not an OA journal–either not a journal or now requires login]: 22
  • xx [Unable to reach or, in a very few cases, unworkable]: 833

There may or may not be another wrapup at the end of the scan, before working toward the rescan.

Gold/Diamond Journal Scan, Real Week Thirteen Summary

March 31st, 2024

Journals scanned so far: 18,400. Articles: 1,317,670 for 2023, 1,348,422 for 2022. 6 065 have fees of some sort; 12,335 do not (diamond OA). I’ll have to look at 2,868 again, including all of the ones below.

Special Cases

  • bi [Inactive: no articles since 2021]: 449
  • xd [Defunct, no articles since 2018]: 205
  • xj: [Apparently removed from DOAJ since 1/1/2024]: 97
  • xm [Malware and certificate problems]: 836
  • xn [Not an OA journal, including ones that now require login]: 19
  • xx [Unavailable or unworkable]: 818

The scan will not be finished next week, but, barring unforeseen problems with real life, it will be done some time during the following week.

Gold/Diamond Scan, Week Twelve

March 24th, 2024

MODIFIED LATER ON 3/24: It would appear that a whole bunch of “refusals” from one Indonesian university were a temporary outage–so I’m resaving this with modified, and better, numbers

MODIFIED 3/31: This was, of course, week twelve, not week thirteen.


17,000 journals scanned, with 1,285,602 2023 articles and 1,315,137 2022 articles. While 5,619 journals have fees, 11,381 are diamond: no fees. Some 2,556 will need to be rechecked.

A tough week in some ways, as you may see if you compare special case numbers with the previous week (which I am deliberately not going to do!)

Special cases

  • bi [no articles since 2021]: 424
  • xd [no articles since 2018]: 191
  • xj [apparently removed from DOAJ since 1/1/2024]: 86
  • xm [malware and certificate/ssl problems]: 642
  • xn [not an OA journal, including journals that seem to require login]: 19
  • xx [unavailable or unworkable]: 785

I’m out of the universidad/e range and into Indonesian journals. So far, it seems as though some of the malware cases in Indonesian universities may be better.

Stopping a bit early today to work on April requirements some more. Should need about 2.5 weeks to do the remaining 3,296 or so journals…but that rapidly-growing number of  “recheck’ cases is a bit discouraging.