GOA7: First pass complete

I’ve finished scanning the 17,302 journals for Gold Open Access 2016-2021.

At the end of that pass, there are 14,572 journals with 2021 articles recorded, for a total of 1,231,397 2021 articles.

But, as usual, there were a lot of problematic journals–1,106 that were either unavailable or not working properly, and 674 with malware or security-certificate issues. These will all be revisited, as will 445 journals that showed no 2021 articles (but no signs of problems) and 225 where at least one 2021 article appeared but it seemed likely that there should have been more.

Oddly enough, I completed Pass 1 on the same date (April 20) as last year, despite checking just under 2,000 more journals. I credit that to fewer emergencies (so far), consistently good broadband and computer performance, restoring use of a direct Excel-to-browser function that had stopped working, and more consistency in many journal webpages. (I also tweeted every day on progress, mostly as a personal goad. it worked.)

Comparing this year’s Pass 2 to last year’s:

  • Last year, I checked 2,211 journals for possible added articles; this year, I’ll be checking 445 that had no 2021 articles and 225 that seem likely to have more. That really compares to the 946 last year flagged during the pass: I was keeping track of comparable numbers and saw no reason to do another algorithmic pass. (See p. 230-231 of GOA6 if you want to know what that’s about.) This scan goes rapidly; I’d hope for considerably less than a week.
  • I’ll be deleting problematic journals removed from DOAJ since 1/1/2022 after the remaining pieces rather than before: that should not affect more than 30-40 journals, and since the intent is to be an “end of 2021” snapshot, it seems reasonable.
  • The scan for “xx”–unavailable or unworkable–will involve 1,106 journals, much worse than last year’s 732. Quite a few of these are 404s because Dergisi Park (Turkey) stopped autoforwarding from its old .gov.tr domain to its new .org domain;¬† a few more are because of an oddity with one SciELO instance that means if you already have browser tabs open for two SciELO journals, it rejects any other attempts. Those can all be fixed, and I hope to clear up several hundred of the xx cases (some clear themselves up–e.g., one university’s server was apparently down on one day). This may be a slow process (the 732 took a week).
  • The scan for “xm” (malware and certificate issues) will involve 674 journals, slightly better than last year’s 781, but still about 674 too high. That process, and additional checking for recalcitrant “xx” cases, may take a while. Last year, I completed the final scan on May 19; I’ll be delighted if I do as well this year. After that comes a few days of data normalization and about a month to prepare the book and mount the dataset at figshare.

So, well, no real target date, but if emergencies continue to be few and mild, the data and main book might–might–be ready in June. (The country book, which will be very different this year because it will focus on the “long tail,” journals not published by one of the Big 9 or 10, would be ready a few weeks later.)

I may continue to tweet progress reports (I’m always waltcrawford), probably not every day. And if you or your institution want to encourage the continuation of this series, consider buying one or all of the trade paperbacks at lulu.com. I won’t get rich (they’re priced by rounding production¬† cost up to the next 50 cent mark), but I spend a lot of care on making sense of the data and think the print book is a good way to see what I’ve found. But, of course, there will also be a free PDF of each book at my website and a free dataset at figshare, both CC-BY.

Now, to start Pass 2.

Oh: my prediction for overall article count is “probably around 1.3 million”–that is, somewhere between 1.23 million (no articles added in Pass 2: very unlikely) and a few tens of thousands.

Incidentally: this scan included 15,055 journals that continued from previous years and 2,247 added to DOAJ in 2021 (most them not new that year). As always, a few hundred journals disappeared–and all but about 120 were explicitly removed from DOAJ during 2021.

