Archive for the ‘open access’ Category

Gold Open Access 6: Halfway Mark

Monday, February 22nd, 2021


I’m now just over halfway through the initial journal scan for GOA6 (8,000 of 15,676), so it’s a good time for a quick progress report.

Note that, as before, I sort journals by publisher before checking–because many multijournal publishers use the same templates for all journals, making it easier for me to find fee data and do article counts.

For GOA6, that means I’ve now checked through publisher PPCT and title Kimün. Revista Interdisciplinaria de Formación Docente. So far, the 2020 article count is 704,825, and that will almost certainly go up slightly. The 2019 total for this set of journals is 586,583 articles. [Yes, the substantial gain for 2020 appears to be legit: for one thing, quite a few MDPI journals saw substantial increases in articles in 2020. And there are quite a few more journals.

Last year, that range of publishers included 7,155 journals, which published 569,645 articles in 2019. So there’s a net gain of 625 added journals so far.

Note that the second remainder of last year’s journals accounted for 281,668 2019 articles, not quite half as many as the first half. So we could wind up with a million articles, but certainly not 1.4 million or close to it. (I’d say a million is probable unless rechecking shows major problems in the figures.)

For this group of 3,200 journals–ignoring the first 4,800–problematic journals include 129 malware cases (that’s out of 3,200–the malware numbers are bad this year, but the latest 1,600 didn’t make them much worse, adding 43), 146 unreachable/unworkable, as well as 47 that had to be reached at new addresses (entirely because DergiPark moved from .gov to .org). These will all be rechecked.

Looking more closely at the malware cases for these 3,200 journals [thus including the fourth report], there are nine security-certificate problem, one spyware, seven malware in general, 18 phishing–and 94 Trojans. Looking at countries in this 20%, I see 70 in Indonesia, 21 Ukraine, 9 Serbia, 3 each in Brazil and Poland, two each in Colombia, Kenya and Turkey, and seventeen singletons.

Now, on to the next 1,600… and I might stop providing problematic-journal details, if the reports continue at all. Perhaps the most striking thing in this segment is that the 58 largest journals in the latest 1,600–mostly from MDPI, but with a few from Nature Publishing Group and others–went from 129,856 articles in 2019 to 184,357 in 2020.



GOA6: Progress Report 4

Thursday, February 11th, 2021


I’m now just over 40% of the way through the initial journal scan for GOA6 (6,400 of 15,676), so it’s a good time for a quick progress report.

Note that, as before, I sort journals by publisher before checking–because many multijournal publishers use the same templates for all journals, making it easier for me to find fee data and do article counts.

For GOA6, that means I’ve now checked through Magazine House of Cancer Research on Prevention and Treatment; so far, the 2020 article count is 443,866, but that will almost certainly go up slightly. The 2019 total for this set of journals is 386,364 articles.

Last year, that range of publishers included 5,775 journals, which published 376,629 articles in 2019. So there’s a net gain of 625 added journals so far.

For this group of 1,600 journals–ignoring the first 4,800–problematic journals include 86 malware cases (that’s out of 1,600–the malware numbers are VERY bad this year), 88 unreachable/unworkable, as well as 23 that had to be reached at new addresses (entirely because DergiPark moved from .gov to .org). These will all be rechecked.

Looking more closely at the surprisingly high number of malware cases for these 1,600 journals, there’s one security-certificate problem, one spyware, six malware in general, 15 phishing–and 63 Trojans. Looking at countries in this 10%, I see one case each in Brazil and India, 54 in Indonesia (mostly Trojan, some phishing, almost all at academic institutions), one each in New Zealand, Pakistan and Romania, two in Russia, seven in Serbia, one each South Korea, Taiwan, Thailand and Turkey, one in the UK–and 14 in Ukraine, 13 of the from LLC “CPC “Business Perspectives”–that is, Trojan in the base software for all 13 of that publisher’s journals.

Looking back at the 174 xm cases in the first 4,800 journals, again by country, I see one each in Argentina and Bangladesh, four in Belarus, 43 in Brazil–35 of them from the same publisher, Conselho Nacional de Pesquisa e Pós-graduação em Direito (CONPEDI) (another root-software Trojan ); one from Chile, three from Croatia, one from Cuba, four from Ecuador, two from Germany, and 75 from Indonesia (that’s in addition to the 54 in the latest 1,600); then one from Iraq, two from Mexico, one from Moldova, six security-certificate problems from BRILL in Netherlands, one each from Pakistan, Poland and Portugal, ten from Romania, seven from Serbia, one each from Slovakia and South Africa, three from Spain, and one each from Ukraine and the US. (Portugal? That was a top-level domain issue, and I’m now ignoring these–most are .info and shouldn’t be flagged at all.

OK, way too much detail on malware issues, but they seem to be getting worse. Two years ago, before the pandemic, DOAJ and its contacts were able to correct nearly all malware cases. Not so last year, and I don’t know what will happen this year. At the end of this project for this year, I will send DOAJ a list of all journals with malware in both years and the suggestion that they be removed from the directory, possibly after one final attempt to get them to fix the problem

Now, on to the next 1,600 and the halfway mark. (Yes, I’m still on pace to be finished with the first pass in late April; I hope that continues.)



Gold Open Access 6: Progress Report 3

Sunday, January 31st, 2021


I’m now just over 30% of the way through the initial journal scan for GOA6 (4,800 of 15,676), so it’s a good time for a quick progress report.

Note that, as before, I sort journals by publisher before checking–because many multijournal publishers use the same templates for all journals, making it easier for me to find fee data and do article counts.

For GOA6, that means I’ve now checked halfway through Immanuel Kant Baltic Federal University (Baltic Region); so far, the 2020 article count is 377,436, but that will almost certainly go up slightly. [Does that mean the total will be over a million articles? WAY too early to say.] The 2019 total for this set of journals is 321,513 articles.]

Last year, that range of publishers included 4,398 journals, which published 317,381 articles in 2019. So there’s a net gain of 402 added journals so far.

This year, problematic journals include 174 malware cases–up sharply from the previous report–five that are not OA, and 182 unreachable/unworkable, as well as 25 that had to be reached at new addresses (entirely because DergiPark moved from .gov to .org). These will all be rechecked. Specific problems include 48 404 errors, 33 503 errors, 26 dns issues, 134 trojans, 8 phishing, 17 other malware and 14 security certificate problems.

For what it’s worth, the same range of publishers last year wound up with 20 journals that had malware but could be analyzed, 106 that had to be reached through an alternate address, 69 malware-not-countable cases, and 12unreachable. So malware is even worse this year, unfortunately.

I’m starting a new segment for completed scans, so the next three progress reports will start from scratch as far as problems are concerned. [Yes, it is going reasonably fast at the moment. Thank Hindawi in part: very easy to gather the needed info.]



Gold Open Access 6: Progress Report 2

Thursday, January 21st, 2021

I’m just over 20% of the way through the initial journal scan for GOA6 (3,200 of 15,676–I discovered another duplicate), so it’s a good time for a quick progress report.

Note that, as before, I sort journals by publisher before checking–because many multijournal publishers use the same templates for all journals, making it easier for me to find fee data and do article counts.

For GOA6, that means I’ve now checked partway through Elsevier (Medicine in Novel Technology and Devices); so far, the 2020 article count is 226,676, but that will almost certainly go up.

Last year, that range of publishers included 2,965 journals, which published 195,025 articles in 2019. (THE 2019 FIGURE IN THE FIRST PROGRESS REPORT WAS WRONG: that figure should be 100,055.)

This year, problematic journals include 41 malware cases, one that’s not OA, and 65 unreachable/unworkable (half of them Cambridge). These will all be rechecked. In addition to all those 503 errors, I see 13 404, one 403, 9 SSL certificate problems, 1 (other) database error, 7 DNS failures, 3 cases of fraud, one apparently hijacked case, 9 malware, one phishing, 18 trojans, and a few others.

For what it’s worth, the same range of publishers last year wound up with 7 journals that had malware but could be analyzed, 33 that had to be reached through an alternate address, 13 malware-not-countable cases, and one unreachable. I’d guess we’ll wind up with similar proportions this year.

So does doing one-tenth in the first 12 days of the year mean I’ll finish the first pass at the end of April (that is, around 120 days into the year)? Possible but unpredictable. Elsevier journals can be checked very rapidly, maybe even faster than BMC (once I figured out the right advanced-search strategy); I don’t believe most other large clusters are that easy. So we shall see: no predictions until I’m at least three-quarters finished! (Best guess is very late April or early to mid-May.)

Gold Open Access 6: Progress Report 1

Tuesday, January 12th, 2021

I’m just over a tenth of the way through the initial journal scan for GOA6 (1,600 of 15,677), so it’s a good time for a quick progress report.

Note that, as before, I sort journals by publisher before checking–because many multijournal publishers use the same templates for all journals, making it easier for me to find fee data and do article counts.

For GOA6, that means I’ve checked “Alexandru Ioan Cuza” University of Iași through Casa Cartii de Stiinta Cluj-Napoca (and the next journal’s a different publisher); so far, the 2020 article count is 105,280, but that will almost certainly go up–both because I recheck journals that publish late and malware/problematic cases, but also specifically because all 32 journals from Cambridge University Press failed with 503 errors (!).

Last year, that range of publishers included 1,503 journals, which published 73,537 100,055 articles in 2019 (the earlier figure was actually 2014 totals). It’s hard to make direct comparisons, because journals do change publishers–but so far the rate of newly-added journals is a little lower than I’d expect.

This year, problematic journals include 41 malware cases, one that’s not OA, and 65 unreachable/unworkable (half of them Cambridge). These will all be rechecked. In addition to all those 503 errors, I see 13 404, one 403, 9 SSL certificate problems, 1 (other) database error, 7 DNS failures, 3 cases of fraud, one apparently hijacked case, 9 malware, one phishing, 18 trojans, and a few others.

For what it’s worth, the same range of publishers last year wound up with 7 journals that had malware but could be analyzed, 33 that had to be reached through an alternate address, 13 malware-not-countable cases, and one unreachable. I’d guess we’ll wind up with similar proportions this year.

So does doing one-tenth in the first 12 days of the year mean I’ll finish the first pass at the end of April (that is, around 120 days into the year)? Possible but unpredictable. On one hand, this group includes 300+ BMC journals that could be checked very rapidly (and the Cambridge journals that couldn’t be checked at all); on the other, it’s hard to avoid some doomscrolling while waiting to see how civil war is avoided or dealt with. So we shall see: no predictions until I’m at least three-quarters finished!

Gold Open Access 6: Early Notes

Monday, December 28th, 2020

I’m almost ready to start data gathering for Gold Open Access 6 (2014-2019). Thursday afternoon after 4 pm (that is, after Midnight January 1 UMT), I’ll do a final download of DOAJ metadata and check the Adds & Deletions spreadsheet–deleting any journals deleted today (December 28) through December 31, and massaging rows of the metadata added since December 27 into new rows of the master sheet used to gather data. I’m guessing there will be no more than one or two deletions and perhaps a dozen additions since I checked this at 12:30 AM (UMT) December 28. [That guess is based on the fact that three titles were deleted and 41 added between December 15 and December 28.]

I downloaded early to do a more thorough job of checking consistency and, for the first time in years, rechecking subject assignments against the subjects and keywords in the DOAJ roles. That process included catching errors from previous years (most of them from VERY early years) and being somewhat more consistent in ambiguous cases–e.g., more journals that cover sustainability going into Ecology, nearly all journals on nutrition going into Medicine, nearly all journals with tourism as a primary focus going into Anthropology, and generally replacing Technology with more specific subjects as appropriate.

Processing the December 15 download yielded 13,528 matches against the GOA5 master spreadsheet, plus 2,103 new titles. Of unmatched GOA5 journals, 456 were explicitly removed from DOAJ and 90 are the usual “small number of mysteries”–most of them cases where a journal has changed both ISSN and normalized URL during the past year. Including the additions and deletes done yesterday, there are currently 15.668 titles in the master spreadsheet; the final number should be slightly higher. Whereas GOA5 wound up with slightly fewer than 14,000 journals being fully analyzed, it’s very likely that GOA6 will include significantly more than 15,000 fully analyzed journals. Will it reach a million articles? Probably not, but we shall see…somewhere between June and September, depending on health, other activities, and how difficult it is to do the manual checking.

Changes from GOA5

In general, metrics remain the same, except for the changes in subject assignments.

“Miscellaneous” has been eliminated as a publisher category (there were 138 such journals in GOA5); “o” now stands for “open/other.”

I’m trying to retain Start (starting date), which DOAJ no longer includes in its downloadable data, by looking for the earliest articles in journals that don’t already have such dates.

I may revise subjects in a few cases when the actual contents appear at odds with the assignment made based on DOAJ information. I’d be surprised if there were even a few dozen such changes.

And, of course, the five-year graphs comparing various editions will be six-year graphs. Since the GOA6 paperback that almost nobody buys will once again be full-color, I won’t attempt to make each of the six lines distinct by dot/dash patterns, relying on color in some cases,

Preliminary Subject Counts

The table that follows shows, for each GOA subject, the journal count in GOA5 (“G5”); the number of continuing journals (“Cont”) after subjects have been rescanned and journals have been deleted; the number of newly-added journals (“New”); and the preliminary GOA6 count (“Total”). These numbers are subject to small changes due to additions and deletions over the next four days and possible on-the-fly revisions during data checking.

SubjectG5ContNewTotal
Agriculture55251867585
Anthropology54958599684
Arts & Architecture36934665411
Biology42438459443
Chemistry18118732219
Computer Science32436153414
Earth Sciences45045264516
Ecology36739975474
Economics926854133987
Education9618831491,032
Engineering53547069539
History39543670506
Language & Literature8938701451,015
Law44143684520
Library Science16516316179
Mathematics26628454338
Media & Communications24527048318
Medicine2,9632,8554413,296
Miscellany21319848246
Other Sciences23019932231
Philosophy24324553298
Physics17118135216
Political Science38041581496
Psychology24124332275
Religion28828648334
Sociology64052966595
Technology25016722189
Zoology27627933312

Added later on 12/28: Why “June to September”?

Why am I so uncertain when I’ll be finished with data gathering (visiting 15,688+ web sites at least once, and probably 2,500+ of them twice)?

Because the time required is so unpredictable, as are factors like the time I can or will devote to it, health, crises, etc.

Let’s look at GOA5. The base dataset was 14,128 journals, including just over 2,000 newly-added journals. The first pass took 102 days–but I felt rushed all the time. The second pass involved 2,476 journals, of which 1,479 required a third visit and 636 a fourth visit–a total of 4,591 additional visits. Those passes took a total of 38 days. So, let’s see, I was able to do an average of 139 journals a day on the first pass–I’m guessing more like 160-170/day for continuing and 100/day for new–and 120/day for the rechecks.

This time, assuming a net gain of 12 journals over the next four days, there will be around 15,680, of which around 2,185 are new. But I’ll be adding starting dates to those 2,173 (and rechecking them on others). So figure anywhere from 100 to 130 journals/day average. That means 120 to 157 days. Assume that total rechecks amount to 32% of the original count, or 5,018, at 100-120/day. adding 43 to 51 days.

So it’s fair to assume at least 163 days to 208 days, if all goes well. So I could be done with data gathering by mid-June, but it could also take until the end of July–again, assuming all goes well, including my energy.

It took about a month after data gathering to process the data and prepare GOA5. I’m guessing about the same this year. So the uploaded dataset and GOA6 could be ready by mid-July, but it could take until early September. Figure less than a month to prepare the Countries book.

It’s conceivable that GOA6 could be ready in June, but it’s highly unlikely. I can’t reasonably devote more than about 30 hours/week to this project: I’m retired, I’m old, there are all the other facets of life to deal with, and–perhaps most important–I know from experience that doing more than 20 journals at a time without a break, with breaks getting longer and longer, just doesn’t work.

So: July-August most likely, late June barely possible, September also possible.

Meanwhile, the color paperback GOA5 is a really great way to read about Gold Open Access; it’s a shame only two copies have been purchased. (My profit on each copy is $0.68. if you’re wondering.)

GOAJ5: November 2020 report

Friday, December 4th, 2020


Readership for GOA5, plus continuing reporting on GOA4.

All links available from the project home page, as always.

GOA5: 2014-2019

  • The dataset: 335 views, 70 downloads–and some unknown number of uses from a third-party dashboard incorporating the dataset.
  • GOA5: 557 PDF ebooks. Two paperbacks (full color, highly recommended).
  • Countries 5: 98 PDF ebooks

GOA4: 2013-2018

  • The dataset: 836 views, 322 downloads.
  • GOA4: 4,197 PDF ebooks
  • Countries 4: 588 PDF ebooks
  • Subjects and Publishers: 476 PDF ebooks



Gold Open Access 5: October 2020 report

Monday, November 2nd, 2020


Readership for GOA5, plus continuing reporting on GOA4.

All links available from the project home page, as always.

GOA5: 2014-2019

  • The dataset: 295 views, 63 downloads–and some unknown number of uses from a third-party dashboard incorporating the dataset.
  • GOA5: 529 PDF ebooks. Two paperbacks (full color, highly recommended).
  • Countries 5: 82 PDF ebooks

GOA4: 2013-2018

  • The dataset: 799 views, 318 downloads.
  • GOA4: 4,106 PDF ebooks
  • Countries 4: 578 PDF ebooks
  • Subjects and Publishers: 468 PDF ebooks



There will be a GOA6

Tuesday, October 27th, 2020

Thanks to SPARC’s continuing support, there will be a Gold Open Access 2015-2020: Articles in Journals appearing sometime in the summer of 2021 (barring health or other disasters).

The study will follow the same pattern as GOA5. I’ll download DOAJ metadata in late December 2020 to the first match and consistency checking, and will determine currency exchange rates to be used for the project (as with this year, they’ll be the median 2020 rate where that’s available, the rate on the December date I check them otherwise–and they’ll be on a tab in the freely-available spreadsheet). Then I’ll download data again shortly after midnight (GMT) on January 1, 2021, and process changes.

I currently plan two changes–neither major. First, I’m finishing the process of getting rid of “Miscellaneous” as a publisher category. (It accounted for considerably less than 1% of articles in GOA5.) Any publishers not categorized as university/college, society/government, or traditional publisher, will be marked as Open Access.

Second, I sense that a few subject-assignment errors crept in some years ago (due to quirks in Excel at the time). I’ve already rechecked subject assignments against the DOAJ-supplied subject information for all journals in GOA5, changing a few dozen in the process, and will complete that process in January. (The changes would only be significant in subject breakdowns for a few countries.)

Other than those changes, I’ll aim for consistency, and add a sixth row to graphs in a Six-Year Comparisons chapter. Once again, the data will be freely available at Figshare (or some other repository), the report will be available as a free PDF or a cost-of-publication color trade paperback, and there will be a Country of Publication report.

I won’t even begin to guess dates or volume. I’d love to see it emerge in July 2021, but won’t predict that. I’d love to see at least 15,000 fully-analyzed journals–and that seems fairly plausible. I’d guess there will be more than 900,000 2020 articles, but would be loath to project a million. We shall see.

Gold Open Access 5: September 2020 report

Friday, October 2nd, 2020

Readership for GOA5, plus continuing reporting on GOA4.

All links available from the project home page, as always.

GOA5: 2014-2019

  • The dataset: 223 views, 39 downloads–and some unknown number of uses from a third-party dashboard incorporating the dataset.
  • GOA5: 406 PDF ebooks. Two paperbacks (full color, highly recommended).
  • Countries 5: 57 PDF ebooks

GOA4: 2013-2018

  • The dataset: 774 views, 302 downloads.
  • GOA4: 3,581 PDF ebooks
  • Countries 4: 553 PDF ebooks
  • Subjects and Publishers: 441 PDF ebooks