Author Archive

GOA6: Second part of second pass

Monday, May 10th, 2021


I just finished the second part of the second data-gathering pass–rechecking 732 journals that had problems other than malware. Most of those problems were resolved, either because the journal’s host fixed a problem or because I could reach the journal at a different URL by searching on title and ISSN. The pass found 21 no longer in DOAJ and left 126 to be rechecked one last time. [The check of journals against those removed from DOAJ since January 1, 2021 yielded 40 cases–and this part of the scan yielded another 21 no longer in DOAJ. Since many journals have two ISSNs, I only save one, and the removal list only provides one, this oddity is not surprising.]

At this point, there are 1,049,954 articles from 2020 and 879,377 from 2019, from 14,635 fully-analyzed journals–numbers that probably won’t grow a lot.

Next step: a quick check of xm/malware journals for resolutions–then, beginning May 15, the final check.



GOA6: First part of second pass

Sunday, May 2nd, 2021

I just finished the first part of the second data-gathering pass–rechecking slightly more than 2,200 journals that (a) seemed likely to have another 2020 issue appear in early 2021 or (b) had less than 2/3 as many 2020 articles as 2019.

In all, counts were changed in 690 of the journals–mostly added 2020 counts but with some changes for 2019 (and, rarely, earlier years) as well. At the end of the process, the 2,200-odd journals show 56,268 articles for 2020 (but 92,225 for 2019: since (b) above meant that every journal with 2019 articles and no 2020 articles was rechecked).

Naturally, the success rate for additional articles declined as the scan progressed, since the time lapse between scans shrank. Counts changed for 273 of the first 550 journals; 207 of the second 550; 106 for the third 550 and 104 for the last 560+.

At this point, 14,054 journals are in place for full analysis, with 1,028,737 articles for 2020 and 860,799 for 2019.

Next step: check the remaining 1,612 journals against the list of journals removed from DOAJ between January 1 and May 2. Then recheck all the journals that were problematic for some reason other than malware, since most of those should be transitory problems.

GOA6: First pass completed

Wednesday, April 21st, 2021

First things first: If you’re in a position to help resolve some of the very large number of journals with malware (787) or ones that were unreachable or unworkable (752), there’s a spreadsheet with the key information for all of them here:

https://docs.google.com/spreadsheets/d/19gXpn3kVn-R33uDdOUSHgssPaLO5CEPRWUGPvDB9az0/edit?usp=sharing

And I won’t do the final piece of the multistep “second pass” until at least May 15. Help from folks with colleagues in Indonesian or Brazilian academia most helpful. (The spreadsheet, g6x, is sorted first by code, then by country, then by publisher, then by journal. The second page lists the codes and notes.)

Here’s where things stand. 15,666 journals had 1,018,364 articles, up from 890,069 2019 articles. The 2020 number will rise somewhat, both because some journals are late to publish issues but also because the numbers don’t include *any* of the malware and unreachable journals (but 2019 numbers do). For GOA5, the 2019 total was 854,018 articles.

The 15,666 (yes, I know, I say 15,667 sometimes–it’s hard to remember to subtract one for the row of labels) include:

  • 13,391 “a”–regular–journals
  • 317 “bi”–no articles in 2019 or 2020, mostly ceased, renamed, changed publishers or otherwise disappeared
  • 3 bm–early cases of journals with malware that could be reached through other addesses
  • 343 bx–journals available at a different URL than the one in DOAJ. There will probably be quite a few more of these; nearly all at present are either Sciendo (from DeGruyter) or dergipark, moved from .gov to .org without generally changing DOAJ records.
  • 58 xd: journals with no articles later than 2014, most of them “duplicates” that have been superseded.
  • 787 xm: Malware
  • 14 xn: Apparently not OA.
  • 1 xt: A website I couldn’t translate or make enough sense of to count
  • 752 xx: Unreachable (404, etc.) or unworkable (db errors, etc.)
  • So far, I see 4,371 journals with fees, 9,706 with no fees, and a few hundred needing rechecking (mostly newly-added journals that are xm or xx).

Now, after ignoring journals for a day or two, I’ll recheck 2,211 journals for added issues/articles and 1,613 to try to clear malware and unreachable cases. (The 2,211 includes 946 cases marked along the way and 1,265 where there were at least 1.5 times as many articles in 2019 as in 2020–the original version of this paragraph had incorrect numbers here; fortunately, the correction means fewer to check.)

As already noted, the final malware pass will start no earlier than May 15. If all goes well, the primary book and spreadsheet should be ready in very late June or early July.

GOA6: Ninth Update

Saturday, April 10th, 2021


Time for another GOA6 checkpoint, at 14,400 of 15,676.

Note that, as always, I sort journals by publisher before checking–because many multijournal publishers use the same templates for all journals, making it easier for me to find fee data and do article counts.

For GOA6, that means I’ve now checked partway through the University of Isfahan. So far, the 2020 article count is 942.685, and that will go up. The 2019 total for this set of journals is 830,018 articles.

Last year, that range of publishers included 12,835 journals, which published 792,068 articles in 2019. So there’s a net gain of 1,562 added journals so far. A million overall articles still seems likely, but not certain.

For this group of 1,600 journals–ignoring the first 12,600–problematic journals include 308 malware case and 75 or so unreachable/unworkable. Yes, that’s a terribly high malware ratio.

Looking more closely at the malware cases for these 1,600 journals, there are ten security-certificate problem, seven ransomware, ten malware, 24 phishing and 256 Trojans.

The problem is mostly Indonesia: 842 of the 1,600 journals in this group are from Indonesia, and 281 of those have malware, mostly at the root URL for a university’s set of journals.

I checked all 14,400 journals scanned so far. Of 764 total malware cases, 481 are from Indonesia. Brazil is a distant second at 121, with smaller clusters from Romania and Spain (and a few cases elsewhere). Yes, Indonesia has more DOAJ-listed journals than any other country, but 481 of Indonesia’s 1,745 (so far) are problematic; Brazil has the second-most journals, and 121 of 1,578 are problematic. (All these figures exclude the remaining 1,276 journals–but only 26 of those are from Indonesia.)

I believe attempts have been made to alert publishers to malware problems. Some may be again this year. This is a continuing problem.

I’d say it’s now nearly certain that the first scan will be done in late April, barring illness or other unexpected events. That would leave some checking and the long rescans. (So far, about 2,200 journals need rechecking; the final number will probably exceed 2,300. Rechecking can be a slow process.)

So no overall target date yet…



GOA6: Update 8

Tuesday, March 30th, 2021


Time for another GOA6 checkpoint, at 12,800 of 15,676.

Note that, as always, I sort journals by publisher before checking–because many multijournal publishers use the same templates for all journals, making it easier for me to find fee data and do article counts.

For GOA6, that means I’ve now checked through publisher Universidade Federal do Rio de Janeiro and title Mana. So far, the 2020 article count is 911,525, and that will almost certainly go up. The 2019 total for this set of journals is 741,882 articles.

Last year, that range of publishers included 11,402 journals, which published 758,050 articles in 2019. So there’s a net gain of 1,398 added journals so far.

For this group of 1,600 journals–ignoring the first 11,200–problematic journals include 53 malware case and 109 unreachable/unworkable.

Looking more closely at the malware cases for these 1,600 journals, there are eight security-certificate problem, five phishing and 39 Trojans–including seven at Universidade Federal de Alagoas and five at Universidade Estadual de Montes Claros.

How confident am I that we’ll reach a million articles? The remaining 2,866 journals had 95l177 articles in 2019, so it’s not certain, but likely. We shall see…

This is an interesting segment, nearly all university journals from Latin American countries or Spain and Portugal. [Actually, one from Sweden, 32 from Portugal, 160 from Spain and all the rest from 18 Latin American countries, with Brazil accounting for 743.] Unsurprisingly, that also means an even higher percentage of no-fee/diamond than overall (likely to be around 70%): of the 1,434 journals fully analyzed out of this 1,600, only 48 have fees.

I’d say it’s now very probable that the first scan will be done in late April, barring illness or other unexpected events–other things are taking up more time, but some 400 of the remaining 2,866 should be relatively fast. We shall see. That would leave some checking and the long rescans. (So far, about 1,800 journals need rechecking; the final number will probably exceed 2,000.)

So no overall target date yet…



Angry?

Friday, March 26th, 2021

Just for fun, I’ve been going through my listening collection–all ripped from owned CDs using MusicBee to FLAC, played back on a Cowan Plenue high-fidelity player–by “genre,” presumably supplied by crowdsourcing to whatever metadata database MusicBee uses. (Background)

Last night, I finished one odd genre and scrolled to the next: Angry.

So what’s included (from my collection, that is)?

One album: No Secrets, by Carly Simon.

Really? Angry? The album shows a confident, talented woman. One song (the basis for the album title) shows her disappointed in her lover/boyfriend/spouse/whatever. Another, the big hit, is “You’re So Vain,” Of the songs on the album, those are as close as I could come to anything even resembling anger, and you’d really be stretching it in either case (especially the latter, which I still love).

My thought went out to whoever supplied that genre: I hope you got help.

GOA6: Seventh note

Wednesday, March 17th, 2021

Time for another GOA6 checkpoint, at 11,200 of 15,676.

Note that, as before, I sort journals by publisher before checking–because many multijournal publishers use the same templates for all journals, making it easier for me to find fee data and do article counts.

For GOA6, that means I’ve now checked through publisher Universidad de Guadalajara and title Sincronía. So far, the 2020 article count is 860,506, and that will almost certainly go up. The 2019 total for this set of journals is 741,882 articles.

Last year, that range of publishers included 9,986 journals, which published 711,296 articles in 2019. So there’s a net gain of 1,214 added journals so far.

For this group of 1,600 journals–ignoring the first 9,600–problematic journals include 53 malware cases, 98 unreachable/unworkable, six non-OA journals (registration required), a few assorted situations, and 29 that had to be found at a different address. Some 20-odd of the 98 are almost certainly very temporary: the second half of a university’s journals all had DNS failures, the morning after the first half were fine.

Looking more closely at the malware cases for these 1,600 journals, there are seven security-certificate problem, three malware in general, six phishing and 36 Trojans.

How confident am I that we’ll reach a million articles? Well, the remaining 4,467 journals had 140,498 articles in 2019, so unless there are fewer articles in 2020 and no gain from the remaining newly-added journals (about 567 of them), it seems likely.

I’ve been running a bit ahead of expected schedule. That may slow down for a bit, for reasons that relate to April 15 and for other non-study reasons. But it’s looking good to complete the first scan by the end of April, followed by the slower second scan and final malware check…

By the way: I *love* to see other studies based on this work, including the spreadsheet, but if you’re planning such a study, please read the book– https://waltcrawford.name/goa5.pdf , or blow $11 on the color paperback at Lulu. There are some useful caveats and other subtleties that aren’t in the spreadsheet.

GOA6: Sixth Report

Friday, March 5th, 2021


Time for another GOA6 checkpoint, at 9,600 of 15,676–and this one’s a mixed bag.

Note that, as before, I sort journals by publisher before checking–because many multijournal publishers use the same templates for all journals, making it easier for me to find fee data and do article counts.

For GOA6, that means I’ve now checked through publisher SpringerOpen and and title Chinese Journal of Mechanical Engineering. So far, the 2020 article count is 789,685, and that will almost certainly go up slightly. The 2019 total for this set of journals is 672,583 articles.

Last year, that range of publishers included 8,614 journals, which published 640,867 articles in 2019. So there’s a net gain of 986 added journals so far.

For this group of 1,600 journals–ignoring the first 8,000–problematic journals include 46 malware cases, 86 unreachable/unworkable, one non-OA journal (registration required)–and the unfortunate part, 249 that had to be found at a different address. A few of those are DergiPark, but most are Sciendo, because parent company DeGruyter implemented a new website that broke all the links to journals it had moved to Sciendo, and hasn’t yet updated DOAJ records. (They’d all be unreachable, but I saw the problem and managed a workaround of sorts.)

I have thoughts about DeGruyter/Sciendo. I will not burden you with them.

Looking more closely at the malware cases for these 1,600 journals, there are eight security-certificate problem, one exploit, one spyware, one malware in general, seven phishing–and 28 Trojans.

Now, on to the next 1,600…



Gold Open Access 6: Halfway Mark

Monday, February 22nd, 2021


I’m now just over halfway through the initial journal scan for GOA6 (8,000 of 15,676), so it’s a good time for a quick progress report.

Note that, as before, I sort journals by publisher before checking–because many multijournal publishers use the same templates for all journals, making it easier for me to find fee data and do article counts.

For GOA6, that means I’ve now checked through publisher PPCT and title Kimün. Revista Interdisciplinaria de Formación Docente. So far, the 2020 article count is 704,825, and that will almost certainly go up slightly. The 2019 total for this set of journals is 586,583 articles. [Yes, the substantial gain for 2020 appears to be legit: for one thing, quite a few MDPI journals saw substantial increases in articles in 2020. And there are quite a few more journals.

Last year, that range of publishers included 7,155 journals, which published 569,645 articles in 2019. So there’s a net gain of 625 added journals so far.

Note that the second remainder of last year’s journals accounted for 281,668 2019 articles, not quite half as many as the first half. So we could wind up with a million articles, but certainly not 1.4 million or close to it. (I’d say a million is probable unless rechecking shows major problems in the figures.)

For this group of 3,200 journals–ignoring the first 4,800–problematic journals include 129 malware cases (that’s out of 3,200–the malware numbers are bad this year, but the latest 1,600 didn’t make them much worse, adding 43), 146 unreachable/unworkable, as well as 47 that had to be reached at new addresses (entirely because DergiPark moved from .gov to .org). These will all be rechecked.

Looking more closely at the malware cases for these 3,200 journals [thus including the fourth report], there are nine security-certificate problem, one spyware, seven malware in general, 18 phishing–and 94 Trojans. Looking at countries in this 20%, I see 70 in Indonesia, 21 Ukraine, 9 Serbia, 3 each in Brazil and Poland, two each in Colombia, Kenya and Turkey, and seventeen singletons.

Now, on to the next 1,600… and I might stop providing problematic-journal details, if the reports continue at all. Perhaps the most striking thing in this segment is that the 58 largest journals in the latest 1,600–mostly from MDPI, but with a few from Nature Publishing Group and others–went from 129,856 articles in 2019 to 184,357 in 2020.



GOA6: Progress Report 4

Thursday, February 11th, 2021


I’m now just over 40% of the way through the initial journal scan for GOA6 (6,400 of 15,676), so it’s a good time for a quick progress report.

Note that, as before, I sort journals by publisher before checking–because many multijournal publishers use the same templates for all journals, making it easier for me to find fee data and do article counts.

For GOA6, that means I’ve now checked through Magazine House of Cancer Research on Prevention and Treatment; so far, the 2020 article count is 443,866, but that will almost certainly go up slightly. The 2019 total for this set of journals is 386,364 articles.

Last year, that range of publishers included 5,775 journals, which published 376,629 articles in 2019. So there’s a net gain of 625 added journals so far.

For this group of 1,600 journals–ignoring the first 4,800–problematic journals include 86 malware cases (that’s out of 1,600–the malware numbers are VERY bad this year), 88 unreachable/unworkable, as well as 23 that had to be reached at new addresses (entirely because DergiPark moved from .gov to .org). These will all be rechecked.

Looking more closely at the surprisingly high number of malware cases for these 1,600 journals, there’s one security-certificate problem, one spyware, six malware in general, 15 phishing–and 63 Trojans. Looking at countries in this 10%, I see one case each in Brazil and India, 54 in Indonesia (mostly Trojan, some phishing, almost all at academic institutions), one each in New Zealand, Pakistan and Romania, two in Russia, seven in Serbia, one each South Korea, Taiwan, Thailand and Turkey, one in the UK–and 14 in Ukraine, 13 of the from LLC “CPC “Business Perspectives”–that is, Trojan in the base software for all 13 of that publisher’s journals.

Looking back at the 174 xm cases in the first 4,800 journals, again by country, I see one each in Argentina and Bangladesh, four in Belarus, 43 in Brazil–35 of them from the same publisher, Conselho Nacional de Pesquisa e Pós-graduação em Direito (CONPEDI) (another root-software Trojan ); one from Chile, three from Croatia, one from Cuba, four from Ecuador, two from Germany, and 75 from Indonesia (that’s in addition to the 54 in the latest 1,600); then one from Iraq, two from Mexico, one from Moldova, six security-certificate problems from BRILL in Netherlands, one each from Pakistan, Poland and Portugal, ten from Romania, seven from Serbia, one each from Slovakia and South Africa, three from Spain, and one each from Ukraine and the US. (Portugal? That was a top-level domain issue, and I’m now ignoring these–most are .info and shouldn’t be flagged at all.

OK, way too much detail on malware issues, but they seem to be getting worse. Two years ago, before the pandemic, DOAJ and its contacts were able to correct nearly all malware cases. Not so last year, and I don’t know what will happen this year. At the end of this project for this year, I will send DOAJ a list of all journals with malware in both years and the suggestion that they be removed from the directory, possibly after one final attempt to get them to fix the problem

Now, on to the next 1,600 and the halfway mark. (Yes, I’m still on pace to be finished with the first pass in late April; I hope that continues.)