I’ve done as much crosschecking as makes sense at this point, and started on the second pass–around 2,,000-2,400 journals to be looked at. That process can be rewarding but slow (an xx/xm journal can be restored in one of four ways, for example, each tried in turn). So I’ll just say “a couple of weeks” where “couple” means 1.5 to 3 or more–plus a week or so for final crosschecks and adding derived data

The Data Questions

I’m considering some data retention/display changes:

  1. ISSN: I don’t believe this is serving any purpose, especially since a journal can have more than one. Before DOAJ added unique URLs, it was one way of identifying a journal, but has never had any role in calculation or display. Unless I hear a good reason¬† not to, this will disappear from the master & shared datasets. [Some amplification: Every DOAJ/URL in the spreadsheet points directly to the DOAJ page with one or both ISSNs for the journal, so there’s no loss of access whatsoever. And just looking at the Figshare data, you can’t tell whether it’s the “right” ISSN.]
  2. Fc (Fee code): I’m inclined to drop this because, now that I’m starting from DOAJ fee numbers, it’s not very useful or reliable. I’m not sure it ever was very useful.
  3. Count code: This has never appeared on Figshare, and was used for the first time this year to track where I was getting article counts for each journal. It’s interesting in a vague summary way (and has been in the weekly reports), but nothing more. I may or may not use it again in future GOAs, if any, but see no reason to add it to the shared spreadsheet.

Meanwhile, the P2 scan has yielded 398 journals that can be fully used and 515 exclusions, including xx2 and xm2 exclusions, with 735 more problematic journals to go and 864 journals that might have picked up more articles. Depending on how that goes, I might do a very fast rescan of the 417 xm2/xx2 journals. Still hoping to finish the prep work and start (but not finish) the book in May 2023. With luck.

  1. Marc Couture says:

    Hi Walt,

    Whenever I had to analyze groups of journals, ISSNs were essential, notably because of the various ways the name of some journals can be written in different lists or databases. The multiple ISSNs for some (many?) journals were not a problem for me. I must say though that I’ve not used the latest version(s) of the spreadsheet.

    Have a good day,


  2. Walt Crawford says:

    Hi Marc,
    And I would have agreed until DOAJ started assigning and storing unique URLs for the DOAJ page for each journal.I’ll think about it some more. Thanks for the comment.

  3. Jan Erik Frantsvåg says:

    I agree that the usefulness of ISSNs are less now that DOAJ has a unique URL that is consistent over time. It changes, though, if a journal leaves DOAJ and comes back again. Keeping the ISSNs in the file should not cost anything, so I think you could well keep them – and if someone wants to link to older data sets the unique URL won’t be there.

    While you don’t share everything publicly, will there be a possibility of asking you for more complete data sets for research purposes?

    And the source of article numbers could be interesting to have a look at.

    I am surprised you have come this far so early in the year!

    Jan Erik

  4. Walt Crawford says:

    Jan Erik,
    I don’t necessarily retain many years of data–and I start out with a fresh set of ISSNs each year, with no link attempts. I use *an* ISSN each year–and which one may change (based om which form of ISSN appears most frequently). The cost of storing ISSNs is the issue of minimizing data confusion–I would have said “preventing,” but I can’t claim 100% success in that regard.

    In fact, so far, I’ve shared whatever is there (and not derived), so there’s no more complete dataset.

    But I am now convinced that I should retain ISSNs, at least this year, thanks to you and Marc C.

    Now if I could get some feedback about my idea of replacing the Country book with a Diamond OA book…