Archive for 2022

Gold Open Access by Country 2016-2021 is out

Wednesday, June 22nd, 2022

Gold Open Access by Country 2016-2021: The Long Tail is now available as a free PDF ebook or a $7.50 trade paperback.

As always, links to the PDF and to the paperback are at https://waltcrawford.name/goaj.html

I’ll have more to say about it later, probably. For now, I’ll say that focusing on the long tail offers what I regard as a more realistic view of where OA is happening, by excluding the “big eleven” megapublishers. [For one example: including all publishers, Switzerland is by far the largest source of OA articles and Brazil is fourth; for the long tail, Brazil is first, Indonesia is fourth, and Switzerland is 32nd. The UK and US are 2nd and 3rd in both cases–but the US is second in the long tail, while the UK is second including all publishers. For another: the average cost per article is $1,374 for all publishers and $336 for the long tail.]

No graphs, a ridiculous number of tables, 259 pages on 60lb. cream paper, with what may finally be an interesting heatmap on the front cover.

Doing another project this year?

Saturday, June 4th, 2022

I’ll be done with GOA7 in a few more weeks (late June or early July barring major surprises), and will probably spend a few months reading a lot more, watching a little more TV, possibly dealing with some household and personal maintenance issues, and determining whether to propose GOA8.

That last depends on whether I believe I can do a good job (am up to it mentally, physically, and in terms of other demands), whether it still seems to be valuable, and whether I’d still have funding.

Meanwhile–let’s say in the time between July 15 and December 15–I could take on another project, if there was one that made sense for all concerned. That is: something where my remaining skills would yield worthwhile results, that wouldn’t be stepping on Proper (Paper-Oriented) Research, that would be financially supported, and would interest me.

I don’t know what that might be, if anything, but thought I’d put it out there. I thought about investigating the “rest of ROAD,” that is, what are all those other OA journals, why aren’t they in DOAJ, do they publish a lot of articles…etc, But ROAD doesn’t appear to have downloadable metadata, and I’m not sure where such a project would lead.

That’s one example. There might be others. If you’re interested, get in touch (waltcrawford@gmail.com). I won’t be holding my breath.

GOA7: It’s out!

Monday, May 30th, 2022

I’m pleased to announce publication of the first three of four deliverables for Gold Open Access 2016-2021: Articles in Journals (GOA7).

The color paperback is available for $10.50 US$, and comparable amounts in other currencies supported by Lulu. The link’s a little long, but going to Lulu.com and searching goa7 will get you right to the page. (Here’s the link: https://www.lulu.com/en/us/shop/walter-crawford/gold-open-access-2016-2021-articles-in-journals-goa7/paperback/product-ydvyrz.html ) I profit by anywhere from $0,10 to $0.60 depending on what currency you use (sorry, Canadians).

The pdf–exactly the same body content as the book, but preceded by the front and back covers–is available for free from my website.

The spreadsheet is available at Figshare, but you can also download it from my website.

These are all CC-BY licensed: do what you wish with them, as long as you name the source–and it’s kind to point people to the originals, so I have some idea of usage.

[I’d love to see a few copies of the paperback sold–and I’m almost a bit surprised that some i-school or library school that cares about OA doesn’t have a set of these studies, but that’s just ego on my part, I guess.]

All of these links, and links to all past studies, are at https://waltcrawford.name/goaj.html

The fourth piece? (Gold Open Access by Country 2016-2021: The Long Tail) I’ll be starting on that later this week. The next post on this blog asks pertinent questions about that study.

Gold Open Access by Country: two quick questions

Sunday, May 29th, 2022

[UPDATE June 3, 2022: No responses were received here. Two responses were received to a shorter related tweetstream. Neither response convinced me that either table is particularly useful in the Country book, and they won’t be included there. The data remains, and I’ll probably retain it in future datasets if any.]

I’m close to finishing GOA7 and thinking about the country book. I have two questions. Responses (email or comment) by May 2 would be most helpful. (Relatively few people download the country books, but I’m hoping…)

1. Are the starting-date tables useful?

They’re already tables with five broad date ranges rather than graphs with two-year increments, but I wonder whether they serve any purpose at all.

[Inclination: to remove them.]

2. Are the publisher category tables useful?

This is especially a question given that this country book excludes the Big 11–but then, the fact that the Big 11 include one society, two universities, and two OA publishers along with six traditional publishers–and that one OA publisher now has more articles and revenue than the biggest traditional publishing group, which swallowed up at least two OA publishers–make me really question the usefulness of these tables.

I’m going to be asking that second question about GOA8 as well (if there is one).

In this case–for future GOA editions, if any–dropping the tables would also mean dropping the PubCat column, but not publisher names.

[Inclination: to remove them.]

Removing both may reduce the book size slightly, by making it possible to do more two-page profiles, and will leave a bit more room for commentary in other cases.

Let me know what you think. If you’re not aware that the country books exist, well, that’s a different issue.

GOA6: Usage Update

Wednesday, May 25th, 2022

As of May 26, 2022, as far as I can tell:

GOA6:

  • Overall report: 1,457 PDF copies (no books other than my copy)
  • Countries: 205 PDF (no books)
  • Dataset: 553 views, 94 downloads

GOA5:

  • Overall report: 1,034 copies (two books)
  • Countries: 253 copies (no books)
  • Dataset: 1246 views, 172 downloads

GOA7 should be out within the next two weeks. I’ll stop tracking GOA5 at the end of the summer.

 

 

GOA7: Quick update

Friday, May 20th, 2022

I’m making good progress on GOA7–the book. Barring huge disruptions in the next few weeks–not a safe bet–it should be ready (and the dataset published) in early June. Possibly even very late May, but don’t quote me on that. (Then, after resting for a couple of days, comes the subject book.)

I don’t include political commentary in the book, but not because I don’t have feelings, politics in this case being the politics of OA publishing and funding. So there won’t be any notes on the subversion of the OA vision by Big Publishers, even if the potential revenue from author-side fees did increase by nearly half a billion dollars from 2020 to 2021 (to roughly one and three quarters billion).

And “Big Publishers” is a tricky term in this case: MDPI, not a traditional publisher at all, appears to have taken in around $540 million in 2021–up more than $200 million from 2020 (partly by publishing a lot more articles, partly by a $255 increase in average cost per article). MDPI now publishes more DOAJ-listed OA articles than all of the Holtzbrinck Group (Springer, Nature, Frontiers, BMC).

But the book is, as usual, mostly lots of tables and graphs with limited commentary–describing what is, not what I think it should be.

Enough for now. Back to the book.

GOA7: Preliminary baseline

Thursday, May 5th, 2022

I believe I’ve now completed the online work for Gold Open Access 2016-2021 (GOA7), to be followed by a day or three of consistency/typo checking, a few days of adding data (persistent DOAJ urls for ongoing work, GOA6 fees and status for comparisons, and various columns of derived data), and several weeks of massaging data and preparing the book. Current hope is mid- to late June for the main book and figshare dataset, a few weeks later for the new “long tail” country book. I’m nearly certain the main book will not be ready in May, and it’s possible that emergencies and problems could push it into July, but “sometime in June” is probable.

So where do things stand, with the understanding that consistency checks may cause numbers to shift very slightly?

Refining Problematic-Journal Coding

Last year, the xm (malware) and xx (unavailable/unworkable) codes included journals with the same problem for two or more years, which were excluded, and those where it was new, which were included.

This year, I refined the coding–adding a few new codes, all of which result in exclusion from the overall study:

  • x2: xm in one year, xx in another. One journal, no 2021 articles.
  • xm2: Malware this year and last. 383 journals (of which 47 come from Brazil, 276 from Indonesia, and 23 from Ukraine), of which DOAJ says 193 had 2021 articles, a total of 5,367 2021 articles.
  • xmi: Malware this year and no articles later than 2019. Nine journals.
  • xo: No longer in DOAJ. 119 journals and problematic in some other way.
  • xx2: unavailable/unworkable this year and last. Twenty journals, two with 2021 articles (22 articles).
  • xxi: unavailable and with no DOAJ-listed articles since 2019. 27 journals.

So the excluded page in the eventual Figshare spreadsheet will include 658 journals (including 89 xd and 10 non-OA journals)–about 160 more than last year, but 119 of those are no longer in DOAJ, so this is actually an improvement.

The most encouraging thing is that there are relatively few new malware cases: 142 in all, compared to 260 last year. Of the 142, 96 are from Indonesia; no other country has more than five. There are slightly more unavailable/unworkable cases (90 compared to 75), but that’s not bad.

The Baseline

Subject to small further refinement, here’s what I see, by code:

Journals 2021 content 2021 articles
a 15,305 14,876 1,242,250
bi 391
bx 699 666 29,096
xm 142 85 2,600
xx 90 16 1,124
Total 16,627 15,643 1,275,070

Again, subject to refinement…but probably not major changes. Compares to last year’s 15,128 fully analyzed journals and 1,061,256 2020 articles.

GOA7 Pass 2: Updating

Friday, April 22nd, 2022

I’ll add to this post as I progress through Pass 2…

April 22: Parts 1 and 2

Part 1 (adding possible 2021 articles to journals that had none) and Part 2 (adding later 2021 issues to journals that seemed as though they should have more) have both been completed, adding around 3,500 2021 articles and increasing the count of journals with 2021 articles by around 100.

The key numbers now, excluding Parts 3 and 4 of Pass 2, are:

Journals that won’t be scanned further: 14,472.

Journals with 2021 articles: 14,675

2021 articles: 1,233,706.

April 22 (2): Journals removed

I reconsidered when journals no longer in DOAJ should be removed, doing this for Part 3 and Part 4 just now. (I went back to June 2021 for removal dates, but in fact all journals removed were done in 2022.)

In all, 62 journals were removed from Part 3 (xx), leaving 1,044 to be rechecked, and 15 were removed from Part 4 (xm), leaving 659 to be rechecked. These 77 journals–marked “xo”–will not be included in the study, although they may be included in one table in Chapter 2 (Exclusions and Special Cases).

April 30: Part 3 complete

The 1,044 xx journals have been rechecked, with reasonably good success. Although there were 312 more cases than in last year’s scan, the number that couldn’t be resolved only increased from 126 to 147. Of the remainder, 177 were fine when retested (which usually means temporary server problems); 41 were either fine or found on an alternate path but hadn’t published since 2019; 651 were found and counted using alternative routes; 10 were dead/duplicates; three aren’t OA journals (two now require login); and 35 were no longer in DOAJ. This was a “lumpy” pass: 223 xx cases came from Sciendo, 80 arose from the DergiPark move from .gov to .org, and 55 came from SciELO instances where URLs hadn’t been updated.

The current totals for non-problematic journals: 16,374 total; 15,446 with 2021 articles; 1,268,018 2021 articles; 385 with no post-2019 articles; 89 dead/duplicate cases (no articles since 2015). Of all these, 2,162 appear to be new to DOAJ and 14,212 are comntinuing.

Next step: Part 4 (xm), and a quick recheck on some items. Yes, I’m about 10 days ahead of last year. Cross fingers.

May 3: Part 4(a)

I’ve gone through the 658 xm (malware) journals, with –as expected–modest results: 5o now active, 5 OK but code bi (inactive since 2019), one bx (found through a different url), one xd (dead/duplicate, and one x0 (no longer in DOAJ).

At the moment, 16,431 journals are ready for processing, 15,496 of which have 2021 articles; there are 1,269,933 2021 articles,

The remaining 595 xm (malware) journals will have codes compared with those in last year’s study, as will the xx/xm journals previously double-checked: last year’s codes help inform whether journals are included in the full study or held out as exclusions (which appear on a separate page of the eventual Figshare spreadsheet). Then, 601 xm journals that weren’t also xm/xx last year will be checked for the possibility of an alternate route. I’d be pleasantly surprised to find many, but it’s worth the two or three days required.

A couple of notes about the current malware group (excluding 10 that changed from xx to xm in the last phase):

Big clusters by publisher include 37 from Universitas Udayana; 29 from Conselho Nacional de Pesquisa e Pós-graduação em Direito (CONPEDI); 28 from Universitas Negeri Malang; 19 from from Diponegoro University; and 14 from Universitas Pendidikan Indonesia.

You may notice something about all but one of those names–and, indeed, breaking down the malware by country shows 403 from Indonesia–just over two-thirds of the total–plus 69 from Brazil and 28 from Ukraine. No other country has more than eight.

May 5: Completion of online scans

I’ve now rechecked xm journals looking for possible alternate URLs and a few other checks, with some success–and gone through the complete dataset getting rid of pure duplicates (either from downloading issues or otherwise).

While these numbers may change very slightly as I do consistency checks in the next day or two, I’d guess such changes will be very small–probably less than 1%. I also have a more nuanced understanding of the malware and problematic issues, and it’s encouraging. I’ll lay that out in a separate post–but if you just want the biggest numbers, the final report is likely to include around 16,726 journals (with another 440 on an exclusions page), of which around 15.643 have 2021 articles, with a total of around 1,275,080 2021 articles. All figures subject to change.

GOA7: First pass complete

Wednesday, April 20th, 2022

I’ve finished scanning the 17,302 journals for Gold Open Access 2016-2021.

At the end of that pass, there are 14,572 journals with 2021 articles recorded, for a total of 1,231,397 2021 articles.

But, as usual, there were a lot of problematic journals–1,106 that were either unavailable or not working properly, and 674 with malware or security-certificate issues. These will all be revisited, as will 445 journals that showed no 2021 articles (but no signs of problems) and 225 where at least one 2021 article appeared but it seemed likely that there should have been more.

Oddly enough, I completed Pass 1 on the same date (April 20) as last year, despite checking just under 2,000 more journals. I credit that to fewer emergencies (so far), consistently good broadband and computer performance, restoring use of a direct Excel-to-browser function that had stopped working, and more consistency in many journal webpages. (I also tweeted every day on progress, mostly as a personal goad. it worked.)

Comparing this year’s Pass 2 to last year’s:

  • Last year, I checked 2,211 journals for possible added articles; this year, I’ll be checking 445 that had no 2021 articles and 225 that seem likely to have more. That really compares to the 946 last year flagged during the pass: I was keeping track of comparable numbers and saw no reason to do another algorithmic pass. (See p. 230-231 of GOA6 if you want to know what that’s about.) This scan goes rapidly; I’d hope for considerably less than a week.
  • I’ll be deleting problematic journals removed from DOAJ since 1/1/2022 after the remaining pieces rather than before: that should not affect more than 30-40 journals, and since the intent is to be an “end of 2021” snapshot, it seems reasonable.
  • The scan for “xx”–unavailable or unworkable–will involve 1,106 journals, much worse than last year’s 732. Quite a few of these are 404s because Dergisi Park (Turkey) stopped autoforwarding from its old .gov.tr domain to its new .org domain;  a few more are because of an oddity with one SciELO instance that means if you already have browser tabs open for two SciELO journals, it rejects any other attempts. Those can all be fixed, and I hope to clear up several hundred of the xx cases (some clear themselves up–e.g., one university’s server was apparently down on one day). This may be a slow process (the 732 took a week).
  • The scan for “xm” (malware and certificate issues) will involve 674 journals, slightly better than last year’s 781, but still about 674 too high. That process, and additional checking for recalcitrant “xx” cases, may take a while. Last year, I completed the final scan on May 19; I’ll be delighted if I do as well this year. After that comes a few days of data normalization and about a month to prepare the book and mount the dataset at figshare.

So, well, no real target date, but if emergencies continue to be few and mild, the data and main book might–might–be ready in June. (The country book, which will be very different this year because it will focus on the “long tail,” journals not published by one of the Big 9 or 10, would be ready a few weeks later.)

I may continue to tweet progress reports (I’m always waltcrawford), probably not every day. And if you or your institution want to encourage the continuation of this series, consider buying one or all of the trade paperbacks at lulu.com. I won’t get rich (they’re priced by rounding production  cost up to the next 50 cent mark), but I spend a lot of care on making sense of the data and think the print book is a good way to see what I’ve found. But, of course, there will also be a free PDF of each book at my website and a free dataset at figshare, both CC-BY.

Now, to start Pass 2.

Oh: my prediction for overall article count is “probably around 1.3 million”–that is, somewhere between 1.23 million (no articles added in Pass 2: very unlikely) and a few tens of thousands.

Incidentally: this scan included 15,055 journals that continued from previous years and 2,247 added to DOAJ in 2021 (most them not new that year). As always, a few hundred journals disappeared–and all but about 120 were explicitly removed from DOAJ during 2021.

GOA7: Three-quarters progress report

Saturday, March 26th, 2022

I’ve scanned 13,000 journals so far–just over three-quarters of them all. (There are 4,302 left to do. Stopping at 12,900 would have left 4,402, just over one-quarter.)

At a similar point in last year’s scan (arranged by publisher and journal), there were 11,386 journals. I show 1,790 newly-added journals so far, so that suggests about 176 removed or missing. [Journals change publishers and thus locations in the spreadsheet, so removed/added figures can change either way.]

The 2021 article count to date is 1,069,316. The last quarter of journals tends to have many fewer articles: there were just under 157,000 2020 articles in the remaining group. So it’s fair to assume that there will be more than 1.2 million 2021 articles, but maybe not a lot more–I’ll stick with 1.3 to 1.4 million as a vague guesstimate.

I’ve been providing daily summaries of journals counted, total to date, total with 2021 articles, and 2021 article count on my Twitter account–not difficult to find! (I’m boring: pretty much walt crawford everywhere…)

I’ve also been providing weekly summaries including counts of problematic or special cases, which so far include 251 inactive (no articles since 2019 or earlier, but at least one since 2015); 15 found at different URLs (there will be a lot more of these in the second pass); 66 dead or duplicate; 379 malware cases; seven that I don’t believe are OA journals; and 856 that were unreachable or unworkable. I would anticipate clearing most of the unreachable/unworkable–and so far, cross fingers, malware cases aren’t as numerous last year (but still precisely 379 cases too high, since there’s no excuse for any of them). My optimistic target was to reach 12,900 or 13,000 by the end of March, and I’ve done better than that. Tomorrow will be devoted to other things, and there’s still enormous uncertainty about outside factors–but if things continue to go well, I could start the second pass before May.