Gold Open Access 6: Early Notes

Monday, December 28th, 2020

I’m almost ready to start data gathering for Gold Open Access 6 (2014-2019). Thursday afternoon after 4 pm (that is, after Midnight January 1 UMT), I’ll do a final download of DOAJ metadata and check the Adds & Deletions spreadsheet–deleting any journals deleted today (December 28) through December 31, and massaging rows of the metadata added since December 27 into new rows of the master sheet used to gather data. I’m guessing there will be no more than one or two deletions and perhaps a dozen additions since I checked this at 12:30 AM (UMT) December 28. [That guess is based on the fact that three titles were deleted and 41 added between December 15 and December 28.]

I downloaded early to do a more thorough job of checking consistency and, for the first time in years, rechecking subject assignments against the subjects and keywords in the DOAJ roles. That process included catching errors from previous years (most of them from VERY early years) and being somewhat more consistent in ambiguous cases–e.g., more journals that cover sustainability going into Ecology, nearly all journals on nutrition going into Medicine, nearly all journals with tourism as a primary focus going into Anthropology, and generally replacing Technology with more specific subjects as appropriate.

Processing the December 15 download yielded 13,528 matches against the GOA5 master spreadsheet, plus 2,103 new titles. Of unmatched GOA5 journals, 456 were explicitly removed from DOAJ and 90 are the usual “small number of mysteries”–most of them cases where a journal has changed both ISSN and normalized URL during the past year. Including the additions and deletes done yesterday, there are currently 15.668 titles in the master spreadsheet; the final number should be slightly higher. Whereas GOA5 wound up with slightly fewer than 14,000 journals being fully analyzed, it’s very likely that GOA6 will include significantly more than 15,000 fully analyzed journals. Will it reach a million articles? Probably not, but we shall see…somewhere between June and September, depending on health, other activities, and how difficult it is to do the manual checking.

Changes from GOA5

In general, metrics remain the same, except for the changes in subject assignments.

“Miscellaneous” has been eliminated as a publisher category (there were 138 such journals in GOA5); “o” now stands for “open/other.”

I’m trying to retain Start (starting date), which DOAJ no longer includes in its downloadable data, by looking for the earliest articles in journals that don’t already have such dates.

I may revise subjects in a few cases when the actual contents appear at odds with the assignment made based on DOAJ information. I’d be surprised if there were even a few dozen such changes.

And, of course, the five-year graphs comparing various editions will be six-year graphs. Since the GOA6 paperback that almost nobody buys will once again be full-color, I won’t attempt to make each of the six lines distinct by dot/dash patterns, relying on color in some cases,

Preliminary Subject Counts

The table that follows shows, for each GOA subject, the journal count in GOA5 (“G5”); the number of continuing journals (“Cont”) after subjects have been rescanned and journals have been deleted; the number of newly-added journals (“New”); and the preliminary GOA6 count (“Total”). These numbers are subject to small changes due to additions and deletions over the next four days and possible on-the-fly revisions during data checking.

Arts & Architecture36934665411
Computer Science32436153414
Earth Sciences45045264516
Language & Literature8938701451,015
Library Science16516316179
Media & Communications24527048318
Other Sciences23019932231
Political Science38041581496

Added later on 12/28: Why “June to September”?

Why am I so uncertain when I’ll be finished with data gathering (visiting 15,688+ web sites at least once, and probably 2,500+ of them twice)?

Because the time required is so unpredictable, as are factors like the time I can or will devote to it, health, crises, etc.

Let’s look at GOA5. The base dataset was 14,128 journals, including just over 2,000 newly-added journals. The first pass took 102 days–but I felt rushed all the time. The second pass involved 2,476 journals, of which 1,479 required a third visit and 636 a fourth visit–a total of 4,591 additional visits. Those passes took a total of 38 days. So, let’s see, I was able to do an average of 139 journals a day on the first pass–I’m guessing more like 160-170/day for continuing and 100/day for new–and 120/day for the rechecks.

This time, assuming a net gain of 12 journals over the next four days, there will be around 15,680, of which around 2,185 are new. But I’ll be adding starting dates to those 2,173 (and rechecking them on others). So figure anywhere from 100 to 130 journals/day average. That means 120 to 157 days. Assume that total rechecks amount to 32% of the original count, or 5,018, at 100-120/day. adding 43 to 51 days.

So it’s fair to assume at least 163 days to 208 days, if all goes well. So I could be done with data gathering by mid-June, but it could also take until the end of July–again, assuming all goes well, including my energy.

It took about a month after data gathering to process the data and prepare GOA5. I’m guessing about the same this year. So the uploaded dataset and GOA6 could be ready by mid-July, but it could take until early September. Figure less than a month to prepare the Countries book.

It’s conceivable that GOA6 could be ready in June, but it’s highly unlikely. I can’t reasonably devote more than about 30 hours/week to this project: I’m retired, I’m old, there are all the other facets of life to deal with, and–perhaps most important–I know from experience that doing more than 20 journals at a time without a break, with breaks getting longer and longer, just doesn’t work.

So: July-August most likely, late June barely possible, September also possible.

GOAJ5: November 2020 report

Friday, December 4th, 2020

Readership for GOA5, plus continuing reporting on GOA4.

All links available from the project home page, as always.

GOA5: 2014-2019

  • The dataset: 335 views, 70 downloads–and some unknown number of uses from a third-party dashboard incorporating the dataset.
  • GOA5: 557 PDF ebooks. Two paperbacks (full color, highly recommended).
  • Countries 5: 98 PDF ebooks

GOA4: 2013-2018

  • The dataset: 836 views, 322 downloads.
  • GOA4: 4,197 PDF ebooks
  • Countries 4: 588 PDF ebooks
  • Subjects and Publishers: 476 PDF ebooks