GOA7: Preliminary baseline

I believe I’ve now completed the online work for Gold Open Access 2016-2021 (GOA7), to be followed by a day or three of consistency/typo checking, a few days of adding data (persistent DOAJ urls for ongoing work, GOA6 fees and status for comparisons, and various columns of derived data), and several weeks of massaging data and preparing the book. Current hope is mid- to late June for the main book and figshare dataset, a few weeks later for the new “long tail” country book. I’m nearly certain the main book will not be ready in May, and it’s possible that emergencies and problems could push it into July, but “sometime in June” is probable.

So where do things stand, with the understanding that consistency checks may cause numbers to shift very slightly?

Refining Problematic-Journal Coding

Last year, the xm (malware) and xx (unavailable/unworkable) codes included journals with the same problem for two or more years, which were excluded, and those where it was new, which were included.

This year, I refined the coding–adding a few new codes, all of which result in exclusion from the overall study:

  • x2: xm in one year, xx in another. One journal, no 2021 articles.
  • xm2: Malware this year and last. 383 journals (of which 47 come from Brazil, 276 from Indonesia, and 23 from Ukraine), of which DOAJ says 193 had 2021 articles, a total of 5,367 2021 articles.
  • xmi: Malware this year and no articles later than 2019. Nine journals.
  • xo: No longer in DOAJ. 119 journals and problematic in some other way.
  • xx2: unavailable/unworkable this year and last. Twenty journals, two with 2021 articles (22 articles).
  • xxi: unavailable and with no DOAJ-listed articles since 2019. 27 journals.

So the excluded page in the eventual Figshare spreadsheet will include 658 journals (including 89 xd and 10 non-OA journals)–about 160 more than last year, but 119 of those are no longer in DOAJ, so this is actually an improvement.

The most encouraging thing is that there are relatively few new malware cases: 142 in all, compared to 260 last year. Of the 142, 96 are from Indonesia; no other country has more than five. There are slightly more unavailable/unworkable cases (90 compared to 75), but that’s not bad.

The Baseline

Subject to small further refinement, here’s what I see, by code:

Journals 2021 content 2021 articles
a 15,305 14,876 1,242,250
bi 391
bx 699 666 29,096
xm 142 85 2,600
xx 90 16 1,124
Total 16,627 15,643 1,275,070

Again, subject to refinement…but probably not major changes. Compares to last year’s 15,128 fully analyzed journals and 1,061,256 2020 articles.

