Archive for 2021

Country supplement for GOA6 is out

Monday, July 12th, 2021

Gold Open Access by Country 2015-2020 is now available for free download or nominal ($7.50 in this case) purchase as a 284-page paperback.

This year, both the front and back cover feature OA heatmaps–but different ones. The front cover shows no-fee or “diamond” OA by 2020 articles per 100,000 people, with minimum values green and maximum deep gold or red. (The software used, GunnMap, is less sophisticated than GunnMap2, which requires the no-longer-supported Adobe Flash, and partly as a result does not have a legend.) The rear cover shows fee-based 2020 articles–but logarithmically, and “reversed” (lowest is yellow, highest is deep green for money).

The PDF ebook (free) is  available on my website: https://waltcrawford.name/g6cntry.pdf.

The trade paperback is available from Lulu at https://www.lulu.com/en/us/shop/walt-crawford/gold-open-access-by-country-2015-2020/paperback/product-eqzgjg.html?page=1&pageSize=4

As always, all links are available at the project page, https://waltcrawford.name/goaj.html

GOA6: June 2021 status report

Friday, July 2nd, 2021

As of July 1, 2021, as far as I can tell:

GOA6:

  • Overall report: 212 PDF copies (no books other than my copy)
  • Dataset: 62 views, nine downloads

GOA5:

  • Overall report: 717 copies (two books)
  • Countries: 129 copies (no books)
  • Dataset: 635 views, 115 downloads

Gold Open Access 2015-2020 (GOA6) is out

Wednesday, June 16th, 2021

Gold Open Access 2015-2020: Articles in Journals (GOA6) is out now.

Key figures: a million and a billion–more than one million articles in 2020, and considerably more than $1 billion in possible fees.

The book, a 243-page trade paperback with color figures, is available for $10.50 (plus shipping) from Lulu at https://www.lulu.com/en/us/shop/walt-crawford/gold-open-access-2015-2020-articles-in-journals-goa6/paperback/product-2ndqwe.html?page=1&pageSize=4. (The price is $10.50 in the US; Lulu requires that I set prices in Euros, Pounds, Australian Dollars and Canadian Dollars as well, so I’d guess Lulu prints and ships locally in many countries–and I set the price rounded up to the nearest ,5 or 1 from printing costs in each case. I clear anywhere from $0.08 to $0.40 depending on currency.)

I mention the printed book first because I really believe it’s the best way to browse through my analysis and occasional comments–and having a few book sales, while obviously irrelevant for income, would encourage me to keep doing these exhausting exhaustive studies (if SPARC continues to support them).

But most of you will prefer the free PDF, available at https://waltcrawford.name/goa6.pdf. It’s precisely the same content as the printed book: the same PDF used for the book, with the front and back covers added.

The dataset is also available at Figshare, at https://figshare.com/articles/dataset/Gold_Open_Access_6_2015-2020/14787888. Since it’s more than 15,000 rows (plus additional worksheets showing currency conversions, codes, and excluded journals), you’ll want to download it).

Everything has Creative Commons BY (attribution) licenses, so you can use them as desired. But if you send other people links rather than redistributing the book or the dataset directly, I can track usage, which is nice.

The country-of-publication book will be ready in a few weeks.

As always, all links are available at the Gold Open Access page at https://www.waltcrawford.name/goaj.html.

GOA6: Data gathering complete

Thursday, May 20th, 2021

I just finished the final data-gathering pass–rechecking all problematic journals. I was able to clear 105 additional journals and found 8 more that either had no post-2014 activity or were no longer in DOAJ. That left 639 xm journals and 91 xx journals. I’ve concluded that defective journals that are still defective after six checks over two years, or that had no post-2018 article counts, should not be included in the overall analysis. In all, 492 journals were excluded, leaving 260 malware and 74 unavailable/unworkable journals, all with article counts from DOAJ, retained for the analysis.

The big numbers: 15,130 fully analyzed journals, 69.7% of them without fees (“diamond” if you like). 14,175 of those showed 2020 articles (68.8% no-fee), for a total of 1,061,256 articles. The bad news: while the percentage of no-fee journals has stayed about constant at nearly 70%, the percentage of no-fee articles has fallen significantly, from 39% for 2019 articles (in GOA5) to 35.5% for 2020: in essence, nearly all the 2020 growth was in fee-charging journals. NOTE: These numbers may change slightly during additional checking–e.g., two journals have moved to the Excluded category, because neither had any post-2014 articles.

If you’re wondering: excluding all xm and xx journals would reduce the 2020 article count by 5,268 articles (but, of course, I only have article counts where journals were reporting them to DOAJ)–and including all xm and xx journals would increase the article count by 4,315 (same caveat applies). In other words, these decisions have almost no impact overall.

I finished the data analysis on the same day as I did last year, despite having more than 1,000 additional journals: that speaks to fewer health and other interruptions, perhaps cleverer counting techniques, and perhaps fewer journals making it really hard to count articles. (Although some do try–including one where literally the only way to find dates is to read the articles and look for the recommended citation form! )

Next: add derived data, move columns around, and start the data processing and book writing. Anticipated completion date: somewhere around June 24-July 4, perhaps 2-3 weeks later for the country book.

I haven’t done a usage report for GOA4 and GOA5 for a while, and now that there’s a hosted copy of the GOA5 dataset with a dashboard elsewhere. I’m not sure how useful they are. I do know that book copies have declined considerably, from over 4,000 for GOA4 to around 740 for GOA5 (including two printed books). I find that discouraging, especially since the book includes caveats that aren’t in the dataset.

Meantime, one with the show. I surely hope we don’t hit “double 70” next year — with 70% of serious OA journals “diamond” but 70% of the articles appearing in the fee-charging journals….

GOA6: Second part of second pass

Monday, May 10th, 2021


I just finished the second part of the second data-gathering pass–rechecking 732 journals that had problems other than malware. Most of those problems were resolved, either because the journal’s host fixed a problem or because I could reach the journal at a different URL by searching on title and ISSN. The pass found 21 no longer in DOAJ and left 126 to be rechecked one last time. [The check of journals against those removed from DOAJ since January 1, 2021 yielded 40 cases–and this part of the scan yielded another 21 no longer in DOAJ. Since many journals have two ISSNs, I only save one, and the removal list only provides one, this oddity is not surprising.]

At this point, there are 1,049,954 articles from 2020 and 879,377 from 2019, from 14,635 fully-analyzed journals–numbers that probably won’t grow a lot.

Next step: a quick check of xm/malware journals for resolutions–then, beginning May 15, the final check.



GOA6: First part of second pass

Sunday, May 2nd, 2021

I just finished the first part of the second data-gathering pass–rechecking slightly more than 2,200 journals that (a) seemed likely to have another 2020 issue appear in early 2021 or (b) had less than 2/3 as many 2020 articles as 2019.

In all, counts were changed in 690 of the journals–mostly added 2020 counts but with some changes for 2019 (and, rarely, earlier years) as well. At the end of the process, the 2,200-odd journals show 56,268 articles for 2020 (but 92,225 for 2019: since (b) above meant that every journal with 2019 articles and no 2020 articles was rechecked).

Naturally, the success rate for additional articles declined as the scan progressed, since the time lapse between scans shrank. Counts changed for 273 of the first 550 journals; 207 of the second 550; 106 for the third 550 and 104 for the last 560+.

At this point, 14,054 journals are in place for full analysis, with 1,028,737 articles for 2020 and 860,799 for 2019.

Next step: check the remaining 1,612 journals against the list of journals removed from DOAJ between January 1 and May 2. Then recheck all the journals that were problematic for some reason other than malware, since most of those should be transitory problems.

GOA6: First pass completed

Wednesday, April 21st, 2021

First things first: If you’re in a position to help resolve some of the very large number of journals with malware (787) or ones that were unreachable or unworkable (752), there’s a spreadsheet with the key information for all of them here:

https://docs.google.com/spreadsheets/d/19gXpn3kVn-R33uDdOUSHgssPaLO5CEPRWUGPvDB9az0/edit?usp=sharing

And I won’t do the final piece of the multistep “second pass” until at least May 15. Help from folks with colleagues in Indonesian or Brazilian academia most helpful. (The spreadsheet, g6x, is sorted first by code, then by country, then by publisher, then by journal. The second page lists the codes and notes.)

Here’s where things stand. 15,666 journals had 1,018,364 articles, up from 890,069 2019 articles. The 2020 number will rise somewhat, both because some journals are late to publish issues but also because the numbers don’t include *any* of the malware and unreachable journals (but 2019 numbers do). For GOA5, the 2019 total was 854,018 articles.

The 15,666 (yes, I know, I say 15,667 sometimes–it’s hard to remember to subtract one for the row of labels) include:

  • 13,391 “a”–regular–journals
  • 317 “bi”–no articles in 2019 or 2020, mostly ceased, renamed, changed publishers or otherwise disappeared
  • 3 bm–early cases of journals with malware that could be reached through other addesses
  • 343 bx–journals available at a different URL than the one in DOAJ. There will probably be quite a few more of these; nearly all at present are either Sciendo (from DeGruyter) or dergipark, moved from .gov to .org without generally changing DOAJ records.
  • 58 xd: journals with no articles later than 2014, most of them “duplicates” that have been superseded.
  • 787 xm: Malware
  • 14 xn: Apparently not OA.
  • 1 xt: A website I couldn’t translate or make enough sense of to count
  • 752 xx: Unreachable (404, etc.) or unworkable (db errors, etc.)
  • So far, I see 4,371 journals with fees, 9,706 with no fees, and a few hundred needing rechecking (mostly newly-added journals that are xm or xx).

Now, after ignoring journals for a day or two, I’ll recheck 2,211 journals for added issues/articles and 1,613 to try to clear malware and unreachable cases. (The 2,211 includes 946 cases marked along the way and 1,265 where there were at least 1.5 times as many articles in 2019 as in 2020–the original version of this paragraph had incorrect numbers here; fortunately, the correction means fewer to check.)

As already noted, the final malware pass will start no earlier than May 15. If all goes well, the primary book and spreadsheet should be ready in very late June or early July.

GOA6: Ninth Update

Saturday, April 10th, 2021


Time for another GOA6 checkpoint, at 14,400 of 15,676.

Note that, as always, I sort journals by publisher before checking–because many multijournal publishers use the same templates for all journals, making it easier for me to find fee data and do article counts.

For GOA6, that means I’ve now checked partway through the University of Isfahan. So far, the 2020 article count is 942.685, and that will go up. The 2019 total for this set of journals is 830,018 articles.

Last year, that range of publishers included 12,835 journals, which published 792,068 articles in 2019. So there’s a net gain of 1,562 added journals so far. A million overall articles still seems likely, but not certain.

For this group of 1,600 journals–ignoring the first 12,600–problematic journals include 308 malware case and 75 or so unreachable/unworkable. Yes, that’s a terribly high malware ratio.

Looking more closely at the malware cases for these 1,600 journals, there are ten security-certificate problem, seven ransomware, ten malware, 24 phishing and 256 Trojans.

The problem is mostly Indonesia: 842 of the 1,600 journals in this group are from Indonesia, and 281 of those have malware, mostly at the root URL for a university’s set of journals.

I checked all 14,400 journals scanned so far. Of 764 total malware cases, 481 are from Indonesia. Brazil is a distant second at 121, with smaller clusters from Romania and Spain (and a few cases elsewhere). Yes, Indonesia has more DOAJ-listed journals than any other country, but 481 of Indonesia’s 1,745 (so far) are problematic; Brazil has the second-most journals, and 121 of 1,578 are problematic. (All these figures exclude the remaining 1,276 journals–but only 26 of those are from Indonesia.)

I believe attempts have been made to alert publishers to malware problems. Some may be again this year. This is a continuing problem.

I’d say it’s now nearly certain that the first scan will be done in late April, barring illness or other unexpected events. That would leave some checking and the long rescans. (So far, about 2,200 journals need rechecking; the final number will probably exceed 2,300. Rechecking can be a slow process.)

So no overall target date yet…



GOA6: Update 8

Tuesday, March 30th, 2021


Time for another GOA6 checkpoint, at 12,800 of 15,676.

Note that, as always, I sort journals by publisher before checking–because many multijournal publishers use the same templates for all journals, making it easier for me to find fee data and do article counts.

For GOA6, that means I’ve now checked through publisher Universidade Federal do Rio de Janeiro and title Mana. So far, the 2020 article count is 911,525, and that will almost certainly go up. The 2019 total for this set of journals is 741,882 articles.

Last year, that range of publishers included 11,402 journals, which published 758,050 articles in 2019. So there’s a net gain of 1,398 added journals so far.

For this group of 1,600 journals–ignoring the first 11,200–problematic journals include 53 malware case and 109 unreachable/unworkable.

Looking more closely at the malware cases for these 1,600 journals, there are eight security-certificate problem, five phishing and 39 Trojans–including seven at Universidade Federal de Alagoas and five at Universidade Estadual de Montes Claros.

How confident am I that we’ll reach a million articles? The remaining 2,866 journals had 95l177 articles in 2019, so it’s not certain, but likely. We shall see…

This is an interesting segment, nearly all university journals from Latin American countries or Spain and Portugal. [Actually, one from Sweden, 32 from Portugal, 160 from Spain and all the rest from 18 Latin American countries, with Brazil accounting for 743.] Unsurprisingly, that also means an even higher percentage of no-fee/diamond than overall (likely to be around 70%): of the 1,434 journals fully analyzed out of this 1,600, only 48 have fees.

I’d say it’s now very probable that the first scan will be done in late April, barring illness or other unexpected events–other things are taking up more time, but some 400 of the remaining 2,866 should be relatively fast. We shall see. That would leave some checking and the long rescans. (So far, about 1,800 journals need rechecking; the final number will probably exceed 2,000.)

So no overall target date yet…



Angry?

Friday, March 26th, 2021

Just for fun, I’ve been going through my listening collection–all ripped from owned CDs using MusicBee to FLAC, played back on a Cowan Plenue high-fidelity player–by “genre,” presumably supplied by crowdsourcing to whatever metadata database MusicBee uses. (Background)

Last night, I finished one odd genre and scrolled to the next: Angry.

So what’s included (from my collection, that is)?

One album: No Secrets, by Carly Simon.

Really? Angry? The album shows a confident, talented woman. One song (the basis for the album title) shows her disappointed in her lover/boyfriend/spouse/whatever. Another, the big hit, is “You’re So Vain,” Of the songs on the album, those are as close as I could come to anything even resembling anger, and you’d really be stretching it in either case (especially the latter, which I still love).

My thought went out to whoever supplied that genre: I hope you got help.