I’ve completed the first portion of the second pass, asked some questions and got some answers, and had a new thought. Here goes:
ISSNs
I’m satisfied that ISSNs do serve (some) purpose in the spreadsheet, so I’ll keep them–and, perhaps to make them a bit more useful, when I do final cleanup I’ll see to it that e-ISSNs are used in all cases where available.
Pass 2, Part 1: problematic journals
This involved around 1,200 journals–mostly xx and xm (but not xx2 or xm2). This is a slogging process (with up to four paths to try to find a “good” site), but definitely productive. (Some 20 journals that should have been in pass 2 part 2–now part 3, see below–were accidentally included here, which does no harm.)
At the end of the scan, I had 307 journals that could be excluded (xx, xm, xn. xo) and 926 journals that are good to go. The latter include about 44,000 2022 articles; the former perhaps 3,100. In practice, most of the 307 journals will be included–all except those that aren’t really journals or are both unfindable and no longer in DOAJ.
Given how well that went, I’ll add another partial check before the scan of 864 journals that seem at least plausibly likely to have more 2022 issues added since they were scanned. By adding Part 2 and making this Part 3, they’ve had four full months to do late additions.
The new Part 2 is a quick scan of the 416 xx2 and xm2 journals–ones that have been problematic for more than one year. Basically, I’ll check each URL; any that are actually available (not xx or xm), I’ll scan properly and count as restored. I will be surprised (pleasantly) if there are more than a couple of dozen of these: journals that are bad for two years tend to stay bad (or get removed from DOAJ). UPDATE: see next post. I did a fuller check, and was indeed pleasantly surprised,
Best guess: that quick scan should take two or three. Part 3, may use the rest of the week, maybe more (there are real-world things that interfere). With a lot of luck, I might be done with data gathering by the end of next week, setting the stage for normalization and adding derived data (e.g., peak articles, revenue, categories of size and price).
New data issues
As already noted, I’ll keep ISSNs.
Having heard no comments to the contrary, I’ll drop fee code from the spreadsheet. (Count code was never in the spreadsheet.)
I’m now looking at code “bx”–available at a different URL. It can happen for any number of reasons. In some previous years, I didn’t actually change the URL in the spreadsheet. I do that now. Last year there were 699 such cases; the year before that, 730. This year there are 438, there for a range of reasons. I don’t believe they add anything to the spreadsheet: they’re part of the data-gathering proces;. Unless I hear reasons not to, I’ll change them to “a,” which will then be a clean code for “active” in 2021-2022.