GOA8: Final download and starting point

Saturday, December 31st, 2022

The DOAJ folks kept busy–adding seven more journals today. I’ve downloaded, selected, massaged, and saved the final GOA8 base, as I’ll use it starting in the new year–either tomorrow or Monday–to analyze and verify data.

It’s not an exact match with DOAJ’s figures: the Base8 dataset has 18,789, while there should be 18,790 rows. So one journal will be missing. (That’s the smallest discrepancy since this project began.) CORRECTION: I was able to locate the missing journal and restore it. The journal is Nalans from Karadeniz Technical University in Turkey. SO: The numbers now match.]

That figure–18,790–includes 16,741 journals continuing from previous years and 2,049 new to GOA8.

As was true last year, I’ll look at journals in order by publisher and by title within publisher–starting with the journal Analele Ştiinţifice Ale Universităţii Alexandru Ioan Cuza din Iași,Sectiunea II A : Genetica si Biologie Moleculara, published by “Alexandru Ioan Cuza” University of Iași, and ending with the journal مدیریت نوآوری published by مدیریت نوآوری.

[As one who used to work for RLG, the only nonprofit in the Unicode Consortium, the group that developed Unicode, I continue to find it remarkable how easy it is to incorporate nonroman material into a blog post or spreadsheet.]

So the real project begins. Tomorrow or the next day. –

GOA8: Third download

Friday, December 30th, 2022

I did another download from DOAJ today, timestamped 1230_2035, and checked the add/del log again. Massaged the data and added 21 more journals while deleting 3.

Surprisingly, the count I have now in the dataset getting ready for searching is 18,783 journals–which is exactly the same as what DOAJ shows on its homepage as of the time I downloaded. That’s reassuring if rare. [Amended next day: It’s also wrong. Sigh. The last row is 18,783, which means 18,782 actual rows of data. Off by one: still very close.][And fixed on 1/1: the missing journal, Nalans, has been restored.]

One last check and, if needed, download tomorrow late afternoon.

GOA8: Second download and datachecks

Monday, December 26th, 2022

The folks at DOAJ have been busy! Based on the past couple of years, I’d guessed there would be under 50 new titles added between December 14 and December 31–but when I downloaded the metadata (and add/delete table) again, with the metadata timestamped 1225_1735, there were 72 new journals and six deletions.

I’ve massaged those into shape for the table used for data collection, including adding subjects and segments and normalizing countries. The base table now has 18,765 journals–which is remarkably close to the count DOAJ shows.

I’ll probably do this once more before the final early-January-1st pass, so that there’s not much to do before starting the count.

GOA8: first download, completed

Friday, December 23rd, 2022

I’ve finished applying subjects and segments to newly-added journal titles–and in a few cases modifying existing subjects. That’s it for preparing the data, except for additions and deletions after December 14.

A few notes about added subjects and countries, for what it’s worth:


The subject with the most added journals is, of course, Medicine, with 352. Three others show at least 100 added journals: Education (142), Language&Literature (124) and Economics (117).

By percentage (new journals as percent of all titles), the two highest are Religion (71 titles, 16.5%) and Technology (42, 16.1%). Others with 12% or more added titles include Sociology (99 journals, 13.6%); Law (87 journals, 13.2%) tied with Mathematics (51 journals, 13.2%); Computer Science (67 journals, 12.3%); Ecology (78 journals, 12.1%); and Education (142 journals, 12.0%). Note that I apply Ecology as widely as possible–to any titles relating to sustainability or environmentalism.

An opportunity: Many journals could be placed in two or more of the 28 subjects I use in the GOA series. If you have personal knowledge of a journal and believe it would be better assigned to a different subject within the 28 (see any of the GOA books for a list), send me email––between now and May 1, 2023. Include the journal title, where you think it belongs, and the EISSN (and/or ISSN), or better yet the DOAJ URL for the journal info.


Indonesia (as usual) has the most newly-added journals (350 journals, 16.2%); if Indonesia has cleaned up the malware problems at many of the universities, it would have more Gold OA journals than any other country.

Two other countries have more than 100 newly-added journals: the United Kingdom (138 journals, 6.9%) and the United States (111 journals, 10.4%).

By percentages, Brunei has the lead, since its only Gold OA journal is newly-added. Eliminating other relatively small OA publishing countries with either one or two newly-added titles (Côte d’Ivoire, Democratic Republic of the Congo, Uzbekistan, Panama, Guatemala, Angola, El Salvador, Honduras, and nineteen others with less than 25% added journals), the following have at least one-third newly-added journals: Syria (9 journals, 81.8%); Armenia (5 journals, 62.5%); Nigeria (9 journals, 45%); Egypt (34 journals, 40.5%); Pakistan (46 journals, 37.1%); and Philippines (7 journals, 33.3%).

Of course, all these figures are subject to change.

What’s Next?

The current base dataset covers journals added and deleted through December 14, 2022,

I’ll probably do last-minute changes in two batches–one early next week (or midweek) and one after midnight UMT, at the very start of 2023. As of right now, I see six removed titles and 33 added titles…but those numbers will change. (There may be another 36 titles that haven’t been added to the change log yet…although the chances of my count and DOAJ’s displayed count ever being identical are slender at best.)

Then, on January 2 (or POSSIBLY late in the day on January 1), I’ll start the slog, with quick updates every few days on Mastodon and less frequent updates here. I still don’t know when the seven weeks of reduced availability on weekdays will begin, but certainly not until at least January 15.


GOA8: First download, part 2

Monday, December 19th, 2022

Turns out it makes sense to split comments on the first download into three parts. This middle part concerns scans and changes that could affect the actual site survey. (The third part, involving the most time and effort, is normalizing country names and adding regions for the added journals, and providing subject and segment names for the added journals.)

Matching continuing journals

This time around, the first match used the DOAJ URLs, newly available last year. At that point, I also determined duplicates based on those URLS (call them DURLs for now).

That first match yielded 16,739 matches after eliminating 12 duplicates.

That left 1,960 new (or unmatched) journals for GOA8 and 535 unmatched GOA7 journals.

A second match using journal URLs yielded three matches. Comparing ISSNs yielded nine matches.

Thus, at this point there are 1,948 new journals and 522 unmatched GOA7 journals.

Checking unmatched journals

Looking at DOAJ’s list of deleted journals (some of which may later have been restored), 277 of the leftover GOA7 could be cleared based on ISSN matches, another 74 based on title matching, leaving 171 mysterious cases–which is about par for the course. (None of those matched new/added journals based on proximate title matching.) There were also 38 deleted journals left over, but as already noted, some or all of these may have been restored.

The baseline continues to be 18,700 journals, of which 16,751 were also in GOA7.

Changes in continuing journals

  • It appears that 257 journals changed from no-fee to fee status and 356 changed from fee to no-fee. All of these will be checked directly for fee status and nature during the scan.
  • 306 increased fees by more than 10% and 1,600 decreased fees by more than 10%; in both cases, the large currency fluctuations of 2022 may explain many of these changes.
  • I defined “major fee changes” as a decrease or increase of both more than $50 and more than 20%. Using that definition, 149 journals had major fee increases while 310 had major fee decreases. All of these will be checked directly during the journal scan.

In all, and accounting for all other cases where fees appear to be more than a straight fixed processing fee (based either on the fee code in GOA7 or the existence of a URL for “other fee” information), it appears that 1,890 journals need to be checked directly for nature, amount, and existence of fees–and 16,610 do not need to be checked.

That’s it for now.

GOA8: First download, part 1

Thursday, December 15th, 2022

I’ve started prep work for GOA8, doing the first metadata downloads yesterday and dealing with currency conversion today, in the process dealing with fee issues for almost all of the journals. (I’ll do another download after midnight GMT on January 1–that is, around 4 pm here on 12/31–and deal with what’s likely to be a few dozen additional titles.)

Over the next week or two I’ll be combining data from GOA7 and the new download, adding subjects to the newly-added journals, and otherwise normalizing data; I’ll do another post when that’s done. Meanwhile, here’s what I see.

Date and Basic Counts

The exported DOAJ metadata has a timestamp of 20221214_2235–that is, 10:35 pm GMT on December 14, 2022. It includes 18,700 records. The add/delete spreadsheet shows 1,385 journals added and 391 removed during 2022. [Edit: 18,699, of course.]

Of those 18,699 journals, 5,836 show APCs and another 187 indicate other fees; 12,677 have no fees.

Of the 5,836 with APCs, 417 also show other fees.

Currency Conversions

I parsed the APC fields (which combine numbers and currency code and can contain multiple fees) to get the first fee and currency combination. That yielded 44 currencies. I entered average 2022 conversion rates for most currencies from; December (first half) averages for a few others ;and December 15 conversion for those currencies not available in other sources.

2022 was a strong year for the US Dollar. 14 other currencies sank 10% or more against the dollar, but only three sank more than 20% (Turkish lira, Argentine peso, Ukrainian hryvnia). On the other hand, three currencies increased significantly (more than 0.7%) against the dollar, and I suspect that one of those–CNY, the Chinese yuan, which rose nearly 800%–is a case of confusion between China’s offshore and domestic currency. (The Russian ruble gained 18.8%, but it’s erratic in any case.)

Looking at the results, I found a large handful (28, all but one CNY) where the converted rate seemed suspiciously high and a few very low fees where a zero may be missing. I marked these all for further checking (as I did for all journals with “other” fees).

As of now:

I see 12,677 with no fee (but if the GOA7 data shows a fee, I’ll recheck); 6,023 with simple fees (but if the GOA7 shows anything but a simple fee, I’ll recheck); and 652 where a recheck is clearly involved. Compared to checking every journal’s website for fees, that’s an obvious time reduction–and seems likely to yield better results.

GOA8: Addendum/change to decisions and schedule

Saturday, November 26th, 2022

After thinking about it and running a full-scale test run, I’ve made a decision that should do two things:

  1. Improve the quality of APC/fee information, by eliminating possible transcription errors and by using 2022 fee/APC levels rather than those of early 2023.
  2. Save time–potentially a lot of time–thus making it possible that GOA8 will be done this Spring, or at least early summer. Not certain, but posslb.e

The changes

Around December 14, I’ll download the DOAJ metadata as usual. But this time, I’ll start out by doing the following:

  • Determine currency usage.
  • Immediately lookup exchange rates (using Forex annual-median where feasible, shorter-term where not).
  • Prepare stripped fee amount and currency columns in DOAJ metadata, and perform the conversions right away.
  • Populate the GOA8 master spreadsheet with those values, and other values as appropriate, to wit:
  • St8 (new status) is n (no fee) if there is no fee, f (fee) if there is one, BUT if the DOAJ metadata shows the possibility of other fees (they now have a field for that), then x.
  • Fee code will start at “d” (derived from DOAJ) in all cases.

Since I retain the fee (call it Fee7), fee code (Fc7), and status (St7) from GOA7, it’s an easy matter to change St8 to “x” whenever Fc7 is any code indicating something other than a standard fee (e.g., submission, b9th submission and processing, membership, variable fee).

While going through the journals, I’ll look for fee information in the journal websites if the St8 is “x”–and populate the fee code appropriately. If not, I’ll just use the downloaded/converted fee.

A trial run suggests that I’ll need to look for fee info within journal websites in about 1,300 journals (out of more than 17,000).

Note that this should make overall fee info *more* accurate because I’ll be using 2022 fee levels, not 2023. And this process essentially removes the possibility of transcription errors.

So: target for completion is still “whenever,” but a considerably earlier “whenever.”

GOA8: Decisions and preliminary schedule

Monday, November 21st, 2022

Here’s where things stand with regard to Gold Open Access 2017-2022:


  • I will be using fees from DOAJ where that seems appropriate, based on fee code for last year and other factors. (Usually decided publisher-by-publisher; some big publishers provide spreadsheets with fees, which I’ll use.)
  • I will be using counts from DOAJ where that seems appropriate, based on availability of counts and consistency with previous data.
  • There will be a new CC column for count codes, e.g. d (DOAJ), f (pattern find), w (provided on website), e (estimate–rarely if ever used)
  • Unless things go more smoothly than expected (see below), malware sites will only be checked twice (and sites that had malware in GOA7 will only be checked once)
  • Final decision on Country book won’t be made until later, but given the underwhelming usage and interest, it may not happen.

Preliminary steps (now to December 15)

  • Clear out pre-GOA7 folders: archive master spreadsheets and ms, delete other files.
  • Set up and prepopulate GOA8 folder; make “test” copy of master with years changed for template work
  • Rearrange columns of new master file (in data gathering, years are in descending order; in reports, they’re ascending).
  • Build and test templates for GOA8, based on G7 templates (but with new CC column)
  • Clear other stuff to allow more time–and, giving the devil his due, that was made much easier when, ahem, certain decisions made me unfollow everybody on Twitter and make it an inactive placeholder, instead of deciding one-by-one how many of 100+ Follows to get rid of.

December 15-31

  • Preliminary DOAJ download[s] (that is, the master database and the added/removed spreadsheete)
  • Match new DURL column as first match between old master and new download
  • As needed, do second match (ISSN and eISSN) and, if needed, third match (normalized URLs).
  • Remove deletions–that is, old data clearly marked as removed from DOAJ
  • Save unmatched for 12/31 rerun.
  • Massage data: copy 2021-2017 data, subject, category, and [for year-to-year comparisons) country and fee, but use new/DOAJ data in all other cases.
  • Normalize data and add subjects to newly-added journals.
  • Build and populate currency conversion spreadsheet based on DOAJ fee & submission currency occurences
  • Afternoon of 12/31 (after midnight GMT): New downloads; do new matches and add new data.
  • attempt to account for unmatched data.
  • As of yesterday (11/20), DOAJ shows 18,561 journals, with 1,285 added this year and 382 removed.

Data gathering, starting January 1, 2023

  • Almost certainly done in publisher order again.
  • Updates here monthly (or so), weekly or more often with #goa8 hashtag at Mastodon, with occasional updates on Facebook (and, if sanity returns, maybe on Twitter).
  • Last year, this process was completed on April 20, with additional testing taking through May 5.
  • Factors that may speed things up: using more DOAJ data; there are fewer new journals this year.
  • Factors that will slow things down: there are more journals–but, probably more important, I already know that health issues will chew up around half of the time I’d normally devote to testing for seven weeks somewhere in the first half of the year–and if some side effects come to play, it may impact timing even more. (And, of course, that’s only issues I already know about…)
  • So my “schedule” is that I hope to have the first pass done by late June (or earlier), but the testing passes might run into July or August. BUT NOTE UPDATE: these changes may speed things up by a month or possibly more.
  • Updates as appropriate.