GOA8: Week 3

January 21st, 2023

Before providing an updated set of counts, a note about likely schedule: It’s become obvious that (a) the changes in handling this year are working well, potentially much better than expected and (b) as a result, I haven’t the foggiest notion how long this is all going to take–almost certainly not as long as my previous pessimistic estimates, fortunately. I now believe “sometime in the spring” is the most useful estimate for completing the first data gathering pass–and, with varying degrees of luck and other stuff, maybe the second pass, data normalizing, and adding derived data columns. It’s even possible that I’ll start on the book and published dataset during very late spring (that is, before July 1), but that’s less likely.

The change in numbers is astonishing, both because things went well this week and because I encountered EDP Sciences and its set of Web of Conferences megajournals and have started in on Elsevier.

Now the numbers:

This was an even more productive week, with 1,400 more journals checked, The overall counts at this point are 3,700 journals checked, of which 3,235 published 240,270 articles in 2022 and 3,431 published 258,091 articles in 2021.

Some details–as always, about the full dataset to date, not this week’s portion.

  • Fee versus diamond/no-fee: 1,359 journals with fees, 2,341 without,
  • New vs. continuing: 458 newly-added, 3,242 continuing.
  • Need rechecking: 538 will be rechecked (including all of the “x”status below).
  • Status code:
    3,311 “a”–clean.
    86 “bi”– inactive (no articles since at least 2020).
    20 “bx”–done but at a different URL.
    18 “xd”–defunct, no articles since at least 2016.
    28 “3m”–malware (but not last year).
    8 “xn”–not an OA journal.
    1367 “xx”–unreachable or unworkable.
    And the two oddities:
    75 “xm2”–malware,also malware last year
    8 “xx2”–unreachable or unworkable, as was true last year.
  • Ease of article counting articles:
    “d” 1,925: easiest, taken directly from DOAJ
    “w” 290: easy, journal website provides direct numbers at either volume or issue number
    “f” 1,054: middling; numbers calculated using Find function for constants (e.g. “doi.” or “pdf”)
    “c” 164: slowest; articles counted manually.

GOA8: Week 2

January 14th, 2023

Somewhat unfortunately (see below), this was a very productive week, with 1,300 more journals checked, The overall counts at this point are 2,300 journals checked, of which 2,032 published 144,531 articles in 2022 and 2,146 published 147,617 articles in 2021.

Why somewhat unfortunately? Because–in addition to the speed with which BMC journals could be checked–that 1,300 came about partly because we had to skip two of our daily walks and I had to skip the usual Wednesday morning hike: just too wet. [Fortunately, our house is at the top of a rise, at 550 feet above sea level, and while Livermore’s regional parks were closed because of flooding and water hazards, we didn’t have flooding. Looking forward to the weeklong dry spell we’re supposed to get starting Monday–as long as it’s not the beginning of a months-long dry spell, as happened last year.]

Meanwhile, some details–and these will always be about the full dataset to date, not this week’s portion.

  • Fee versus diamond/no-fee: 809 journals with fees, 1,491 without,
  • New vs. continuing: 283 newly-added, 2017 continuing.
  • Need rechecking: 310 will be rechecked (including all of the “x”status below).
  • Status code: 2,091 “a”–clean and done. 57 “bi”– inactive (no articles since at least 2020). 13 “bx”–done but at a different URL. 12 “xd”–defunct, no articles since at least 2016. 28 “xm”–malware (but not last year). 4 “xn”–not an OA journal. 77 “xx”–unreachable or unworkable. And the two oddities: 11 “xm2”–malware, as was true last year; and 6 “xx2”–unreachable, as was true last year.
  • Ease of article counting (added 1/15):
    “d” 1,099: easiest, taken directly from DOAJ
    “w” 235: easy, journal website provides direct numbers at either volume or issue number
    “f” 716: middling; numbers calculated using Find function for constants (e.g. “doi.” or “pdf”)
    “c” 113: slowest; articles counted manually.

I am seeing suggestions that even the modest $28/year I spend for Malwarebytes Pro (which covers my wife’s notebook as well) is needless, that there’s no need to pay for security beyond Windows builtin functions. I’m not willing to take that chance, and can give you about 30 reasons so far [some of the 39 were certificate problems that Windows itself absolutely catches.

Where am I? The 2,300th journal is the Mesopotamia Journal of Agriculture, published in Iraq by the College of Agriculture. It has a $100 fee and published 36 articles in 2022

GOA8: Week 1

January 7th, 2023

Here’s how things stand after the first week of visiting gold OA journals for GOA8 (in alphabetic order by publisher, then by journal):

1,000 journals scanned. 883 of them published 51,008 articles in 2022; 951 of them published 53,891 articles in 2021. (Both numbers subject to change on revisiting.)

At 1,000 journals a week, the first scan will be done in late May. There will be a known seven-week slowdown (which may or may not be major, and I don’t yet know when it will be–but not until at least January 30.) My daily minimum goal is 100 journals–which would take until July 9 to finish the first pass. I’m hoping the final time required will be somewhere in between.

Three details

The above numbers are from a pivot table in the “done” spreadsheet. I added three more tables to track items of interest during the scan–at least one of which might not be in the final report.

Counting codes

Of the 939 journals for which a count was feasible at this point:

  • 424 were code d–the DOAJ figure appears probable. This is the easiest.
  • 131 were code w–the journal web  pages offered easy direct numbers for each issue or for the year. Also easy.
  • 328 were code f–I could use Find to determine the count for each issue (e.g., counting “pdf” or “doi.”). Not as easy, for various reasons.
  • 56 were code c–Counting articles by hand. By far the hardest.

Coded status

Of the 1000 journals:

  • 910 are code a–which is the best code.
  • 21 are bi: inactive, with no articles since 2020
  • 9 are bx: journals is findable but at a different URL
  • 6 are xd: ceased or duplicate, with no articles since 2016
  • 19 are xm: malware or bad certificate (with luck, rechecks will reduce this number)
  • 3 are xn: Not an OA journal (two appear to require registration, one is an encyclopedia)
  • 32 are xx: currently unreachable or unworkable: rechecks should reduce this number

Recheck?

141 of the first 1,000 journals need rechecking–either because xm or xx, or because they appear to be missing some 2022 data.

Off to a good start. Some weeks might show more journals, some may show (a lot) less.

GOA8 progress posts mostly on Mastodon

January 2nd, 2023

I’ve started the scan, and it’s looking reasonably promising. I may do a monthly update here, but more frequent updates will be on Mastodon.

GOA8: Final download and starting point

December 31st, 2022

The DOAJ folks kept busy–adding seven more journals today. I’ve downloaded, selected, massaged, and saved the final GOA8 base, as I’ll use it starting in the new year–either tomorrow or Monday–to analyze and verify data.

It’s not an exact match with DOAJ’s figures: the Base8 dataset has 18,789, while there should be 18,790 rows. So one journal will be missing. (That’s the smallest discrepancy since this project began.) CORRECTION: I was able to locate the missing journal and restore it. The journal is Nalans from Karadeniz Technical University in Turkey. SO: The numbers now match.]

That figure–18,790–includes 16,741 journals continuing from previous years and 2,049 new to GOA8.

As was true last year, I’ll look at journals in order by publisher and by title within publisher–starting with the journal Analele Ştiinţifice Ale Universităţii Alexandru Ioan Cuza din Iași,Sectiunea II A : Genetica si Biologie Moleculara, published by “Alexandru Ioan Cuza” University of Iași, and ending with the journal مدیریت نوآوری published by مدیریت نوآوری.

[As one who used to work for RLG, the only nonprofit in the Unicode Consortium, the group that developed Unicode, I continue to find it remarkable how easy it is to incorporate nonroman material into a blog post or spreadsheet.]

So the real project begins. Tomorrow or the next day. –

GOA8: Third download

December 30th, 2022

I did another download from DOAJ today, timestamped 1230_2035, and checked the add/del log again. Massaged the data and added 21 more journals while deleting 3.

Surprisingly, the count I have now in the dataset getting ready for searching is 18,783 journals–which is exactly the same as what DOAJ shows on its homepage as of the time I downloaded. That’s reassuring if rare. [Amended next day: It’s also wrong. Sigh. The last row is 18,783, which means 18,782 actual rows of data. Off by one: still very close.][And fixed on 1/1: the missing journal, Nalans, has been restored.]

One last check and, if needed, download tomorrow late afternoon.

GOA8: Second download and datachecks

December 26th, 2022

The folks at DOAJ have been busy! Based on the past couple of years, I’d guessed there would be under 50 new titles added between December 14 and December 31–but when I downloaded the metadata (and add/delete table) again, with the metadata timestamped 1225_1735, there were 72 new journals and six deletions.

I’ve massaged those into shape for the table used for data collection, including adding subjects and segments and normalizing countries. The base table now has 18,765 journals–which is remarkably close to the count DOAJ shows.

I’ll probably do this once more before the final early-January-1st pass, so that there’s not much to do before starting the count.

GOA8: first download, completed

December 23rd, 2022

I’ve finished applying subjects and segments to newly-added journal titles–and in a few cases modifying existing subjects. That’s it for preparing the data, except for additions and deletions after December 14.

A few notes about added subjects and countries, for what it’s worth:

Subjects

The subject with the most added journals is, of course, Medicine, with 352. Three others show at least 100 added journals: Education (142), Language&Literature (124) and Economics (117).

By percentage (new journals as percent of all titles), the two highest are Religion (71 titles, 16.5%) and Technology (42, 16.1%). Others with 12% or more added titles include Sociology (99 journals, 13.6%); Law (87 journals, 13.2%) tied with Mathematics (51 journals, 13.2%); Computer Science (67 journals, 12.3%); Ecology (78 journals, 12.1%); and Education (142 journals, 12.0%). Note that I apply Ecology as widely as possible–to any titles relating to sustainability or environmentalism.

An opportunity: Many journals could be placed in two or more of the 28 subjects I use in the GOA series. If you have personal knowledge of a journal and believe it would be better assigned to a different subject within the 28 (see any of the GOA books for a list), send me email–waltcrawford@gmail.com–between now and May 1, 2023. Include the journal title, where you think it belongs, and the EISSN (and/or ISSN), or better yet the DOAJ URL for the journal info.

Countries

Indonesia (as usual) has the most newly-added journals (350 journals, 16.2%); if Indonesia has cleaned up the malware problems at many of the universities, it would have more Gold OA journals than any other country.

Two other countries have more than 100 newly-added journals: the United Kingdom (138 journals, 6.9%) and the United States (111 journals, 10.4%).

By percentages, Brunei has the lead, since its only Gold OA journal is newly-added. Eliminating other relatively small OA publishing countries with either one or two newly-added titles (Côte d’Ivoire, Democratic Republic of the Congo, Uzbekistan, Panama, Guatemala, Angola, El Salvador, Honduras, and nineteen others with less than 25% added journals), the following have at least one-third newly-added journals: Syria (9 journals, 81.8%); Armenia (5 journals, 62.5%); Nigeria (9 journals, 45%); Egypt (34 journals, 40.5%); Pakistan (46 journals, 37.1%); and Philippines (7 journals, 33.3%).

Of course, all these figures are subject to change.

What’s Next?

The current base dataset covers journals added and deleted through December 14, 2022,

I’ll probably do last-minute changes in two batches–one early next week (or midweek) and one after midnight UMT, at the very start of 2023. As of right now, I see six removed titles and 33 added titles…but those numbers will change. (There may be another 36 titles that haven’t been added to the change log yet…although the chances of my count and DOAJ’s displayed count ever being identical are slender at best.)

Then, on January 2 (or POSSIBLY late in the day on January 1), I’ll start the slog, with quick updates every few days on Mastodon and less frequent updates here. I still don’t know when the seven weeks of reduced availability on weekdays will begin, but certainly not until at least January 15.

 

GOA8: First download, part 2

December 19th, 2022

Turns out it makes sense to split comments on the first download into three parts. This middle part concerns scans and changes that could affect the actual site survey. (The third part, involving the most time and effort, is normalizing country names and adding regions for the added journals, and providing subject and segment names for the added journals.)

Matching continuing journals

This time around, the first match used the DOAJ URLs, newly available last year. At that point, I also determined duplicates based on those URLS (call them DURLs for now).

That first match yielded 16,739 matches after eliminating 12 duplicates.

That left 1,960 new (or unmatched) journals for GOA8 and 535 unmatched GOA7 journals.

A second match using journal URLs yielded three matches. Comparing ISSNs yielded nine matches.

Thus, at this point there are 1,948 new journals and 522 unmatched GOA7 journals.

Checking unmatched journals

Looking at DOAJ’s list of deleted journals (some of which may later have been restored), 277 of the leftover GOA7 could be cleared based on ISSN matches, another 74 based on title matching, leaving 171 mysterious cases–which is about par for the course. (None of those matched new/added journals based on proximate title matching.) There were also 38 deleted journals left over, but as already noted, some or all of these may have been restored.

The baseline continues to be 18,700 journals, of which 16,751 were also in GOA7.

Changes in continuing journals

  • It appears that 257 journals changed from no-fee to fee status and 356 changed from fee to no-fee. All of these will be checked directly for fee status and nature during the scan.
  • 306 increased fees by more than 10% and 1,600 decreased fees by more than 10%; in both cases, the large currency fluctuations of 2022 may explain many of these changes.
  • I defined “major fee changes” as a decrease or increase of both more than $50 and more than 20%. Using that definition, 149 journals had major fee increases while 310 had major fee decreases. All of these will be checked directly during the journal scan.

In all, and accounting for all other cases where fees appear to be more than a straight fixed processing fee (based either on the fee code in GOA7 or the existence of a URL for “other fee” information), it appears that 1,890 journals need to be checked directly for nature, amount, and existence of fees–and 16,610 do not need to be checked.

That’s it for now.

GOA8: First download, part 1

December 15th, 2022

I’ve started prep work for GOA8, doing the first metadata downloads yesterday and dealing with currency conversion today, in the process dealing with fee issues for almost all of the journals. (I’ll do another download after midnight GMT on January 1–that is, around 4 pm here on 12/31–and deal with what’s likely to be a few dozen additional titles.)

Over the next week or two I’ll be combining data from GOA7 and the new download, adding subjects to the newly-added journals, and otherwise normalizing data; I’ll do another post when that’s done. Meanwhile, here’s what I see.

Date and Basic Counts

The exported DOAJ metadata has a timestamp of 20221214_2235–that is, 10:35 pm GMT on December 14, 2022. It includes 18,700 records. The add/delete spreadsheet shows 1,385 journals added and 391 removed during 2022. [Edit: 18,699, of course.]

Of those 18,699 journals, 5,836 show APCs and another 187 indicate other fees; 12,677 have no fees.

Of the 5,836 with APCs, 417 also show other fees.

Currency Conversions

I parsed the APC fields (which combine numbers and currency code and can contain multiple fees) to get the first fee and currency combination. That yielded 44 currencies. I entered average 2022 conversion rates for most currencies from ofx.com; December (first half) averages for a few others ;and December 15 conversion for those currencies not available in other sources.

2022 was a strong year for the US Dollar. 14 other currencies sank 10% or more against the dollar, but only three sank more than 20% (Turkish lira, Argentine peso, Ukrainian hryvnia). On the other hand, three currencies increased significantly (more than 0.7%) against the dollar, and I suspect that one of those–CNY, the Chinese yuan, which rose nearly 800%–is a case of confusion between China’s offshore and domestic currency. (The Russian ruble gained 18.8%, but it’s erratic in any case.)

Looking at the results, I found a large handful (28, all but one CNY) where the converted rate seemed suspiciously high and a few very low fees where a zero may be missing. I marked these all for further checking (as I did for all journals with “other” fees).

As of now:

I see 12,677 with no fee (but if the GOA7 data shows a fee, I’ll recheck); 6,023 with simple fees (but if the GOA7 shows anything but a simple fee, I’ll recheck); and 652 where a recheck is clearly involved. Compared to checking every journal’s website for fees, that’s an obvious time reduction–and seems likely to yield better results.