Archive for April, 2023

GOA8: Week 17

Saturday, April 29th, 2023

I’ve completed the first portion of the second pass, asked some questions and got some answers, and had a new thought. Here goes:

ISSNs

I’m satisfied that ISSNs do serve (some) purpose in the spreadsheet, so I’ll keep them–and, perhaps to make them a bit more useful, when I do final cleanup I’ll see to it that e-ISSNs are used in all cases where available.

Pass 2, Part 1: problematic journals

This involved around 1,200 journals–mostly xx and xm (but not xx2 or xm2). This is a slogging process (with up to four paths to try to find a “good” site), but definitely productive. (Some 20 journals that should have  been in pass 2 part 2–now part 3, see below–were accidentally included here, which does no harm.)

At the end of the scan, I had 307 journals that could be excluded (xx, xm, xn. xo) and 926 journals that are good to go. The latter include about 44,000 2022 articles; the former perhaps 3,100. In practice, most of the 307 journals will be included–all except those that aren’t really journals or are both unfindable and no longer in DOAJ.

Given how well that went, I’ll add another partial check before the scan of 864 journals that seem at least plausibly likely to have more 2022 issues added since they were scanned. By adding Part 2 and making this Part 3, they’ve had four full months to do late additions.

The new Part 2 is a quick scan of the 416 xx2 and xm2 journals–ones that have been problematic for more than one year. Basically, I’ll check each URL; any that are actually available (not xx or xm), I’ll scan properly and count as restored. I will be surprised (pleasantly) if there are more than a couple of dozen of these: journals that are bad for two years tend to stay bad (or get removed from DOAJ). UPDATE: see next post. I did a fuller check, and was indeed pleasantly surprised,

Best guess: that quick scan should take two or three. Part 3,  may use the rest of the week, maybe more (there are real-world things that interfere). With a lot of luck, I might be done with data gathering by the end of next week, setting the stage for normalization and adding derived data (e.g., peak articles, revenue, categories of size and price).

New data issues

As already noted, I’ll keep ISSNs.

Having heard no comments to the contrary, I’ll drop fee code from the spreadsheet. (Count code was never in the spreadsheet.)

I’m now looking at code “bx”–available at a different URL. It can happen for any number of reasons. In some previous years, I didn’t actually change the URL in the spreadsheet. I do that now. Last year there were 699 such cases; the year before that, 730. This year there are 438, there for a range of reasons. I don’t believe they add anything to the spreadsheet: they’re part of the data-gathering proces;. Unless I hear reasons not to, I’ll change them to “a,” which will then be a clean code for “active” in 2021-2022.

GOA8: Some data questions and a progress report

Tuesday, April 25th, 2023

I’ve done as much crosschecking as makes sense at this point, and started on the second pass–around 2,,000-2,400 journals to be looked at. That process can be rewarding but slow (an xx/xm journal can be restored in one of four ways, for example, each tried in turn). So I’ll just say “a couple of weeks” where “couple” means 1.5 to 3 or more–plus a week or so for final crosschecks and adding derived data

The Data Questions

I’m considering some data retention/display changes:

  1. ISSN: I don’t believe this is serving any purpose, especially since a journal can have more than one. Before DOAJ added unique URLs, it was one way of identifying a journal, but has never had any role in calculation or display. Unless I hear a good reason  not to, this will disappear from the master & shared datasets. [Some amplification: Every DOAJ/URL in the spreadsheet points directly to the DOAJ page with one or both ISSNs for the journal, so there’s no loss of access whatsoever. And just looking at the Figshare data, you can’t tell whether it’s the “right” ISSN.]
  2. Fc (Fee code): I’m inclined to drop this because, now that I’m starting from DOAJ fee numbers, it’s not very useful or reliable. I’m not sure it ever was very useful.
  3. Count code: This has never appeared on Figshare, and was used for the first time this year to track where I was getting article counts for each journal. It’s interesting in a vague summary way (and has been in the weekly reports), but nothing more. I may or may not use it again in future GOAs, if any, but see no reason to add it to the shared spreadsheet.

That’s it. You know where to comment. If there are any comments I’ll look at them–but I’m not holding my breath.

Meanwhile, the P2 scan has yielded 398 journals that can be fully used and 515 exclusions, including xx2 and xm2 exclusions, with 735 more problematic journals to go and 864 journals that might have picked up more articles. Depending on how that goes, I might do a very fast rescan of the 417 xm2/xx2 journals. Still hoping to finish the prep work and start (but not finish) the book in May 2023. With luck.

GOA8: Week 16

Saturday, April 22nd, 2023

Between the first pass and the second/final pass comes consistency checking, which can take an hour or two to a day or four. That’s still going on, but may be done soon. Meanwhile…

GOA8: Week 15.5, end of first pass

Wednesday, April 19th, 2023

That’s right! The first pass is complete. Now comes a week or two of data checking and rechecking somewhere between 2,000 and 2,450 journals (depending on my xm2/xx2 decision). The numbers below are subject to change in two ways: some journals that have counts will be excluded from the final dataset (last year, that was around 6,000 articles eliminated) and some journals will have more articles added (probably around 3,000 articles based on last year) and, with luck, some unavailable journals that *don’t* have counts will become available.

The numbers

The overall counts at this point:
18,769 journals checked, of which
16,548 published 1,420,735 articles in 2022 and
17,457 published 1,334,553 articles in 2021.

The rest of the numbers:

  • Fee versus diamond/no-fee: 5,838 journals with fees, 12,931 without. Just over two-thirds of journals are fee-free.
  • New vs. continuing: 2,164 newly-added, 16,605 continuing (including all of the “x”status below).
  • Status code:
    16,562 “a”–clean.
    447 “bi”– inactive (no articles since at least 2020).
    75 “bx”–done but at a different URL.
    109 “xd”–defunct, no articles since at least 2016.
    326 “xm”–malware (but not last year).
    57 “xn”–not an OA journal (including those removed this year but before I got to them) and ones suddenly requiring a login.
    776 “xx”–unreachable or unworkable.
    And the two oddities:
    359 “xm2”–malware, also malware last year
    58 “xx2”–unreachable or unworkable, also last year.
  • Ease of article counting:
    “d” 9,410: easiest, taken directly from DOAJ (sometimes with 2022 count modified)
    “w” 1,022: easy, journal website provides direct numbers at either volume or issue level.
    “f”  5,455: middling; numbers calculated using Find function for constants (e.g. “doi.” or “pdf”)
    “c” 566: slowest; articles counted manually.
  • Why the counts of “ease of…” don’t add up to total journals counted: all xd and bi cases, not quite all other non-a cases. If I couldn’t count them at all…

 

 

GOA8: Week 15

Saturday, April 15th, 2023

Another fairly strong week, and clearly the penultimate week for the first pass. There are 869 journals left to scan, and around 500 of those are either from Wiley or Wolters Kluwer/Wolters Kluwer Medknow. (Just finishing up Vilnius University Press, and if all the rest were as clearly done, this would be even easier.)

I anticipate finishing the first pass. I expect to do a little cleanup/consistency work. I hope to split off those needing further attention (around 2,400, but that will rise a bit) and check those for journals dropped from1 DOAJ since 1/1/2023–and probably decide whether to skip rechecking for xx2/xm2 cases. The following week or two involves rechecking several hundred journals that might reasonably have added 2022 issues since they were first checked, and then resolving the rest of the problematic cases. (After that? A couple of days to add derivative columns such as revenue, journal size and article price, then start on the book–with luck, in very early May.)

The numbers

1,300 more journals checked.

The overall counts at this point:
17,900 journals checked, of which
15,733 published 1,347,316 articles in 2022 and
16,637 published 1,263,927 articles in 2021.

The rest of the numbers:

  • Fee versus diamond/no-fee: 5,450 journals with fees, 12,450 without.
  • New vs. continuing: 2,057 newly-added, 15,843 continuing (including all of the “x”status below).
  • Status code:
    15,740 “a”–clean.
    422 “bi”– inactive (no articles since at least 2020).
    73 “bx”–done but at a different URL.
    108 “xd”–defunct, no articles since at least 2016.
    321 “xm”–malware (but not last year).
    54 “xn”–not an OA journal (including those removed this year but before I got to them) and ones suddenly requiring a login.
    765 “xx”–unreachable or unworkable.
    And the two oddities:
    359 “xm2”–malware, also malware last year
    58 “xx2”–unreachable or unworkable, also last year.
  • Ease of article counting:
    “d” 9,410: easiest, taken directly from DOAJ (sometimes with 2022 count modified)
    “w” 1,022: easy, journal website provides direct numbers at either volume or issue level.
    “f”  5,455: middling; numbers calculated using Find function for constants (e.g. “doi.” or “pdf”)
    “c” 566: slowest; articles counted manually.
  • Why the counts of “ease of…” don’t add up to total journals counted: all xd and bi cases, not quite all other non-a cases. If I couldn’t count them at all…

And I’d still appreciate feedback on the Diamond OA idea. Anyone out there? [At the moment I’m inclined to do it, but would love a little support…]

 

GOAJ stats: April 10, slightly incomplete

Monday, April 10th, 2023

I forgot to do another statistics run at the end of 2022, so some of these figures (PDF downloads, not Figshare data use) are missing part of December 2022. Given the low rate of use, I doubt that it makes much difference.

Gold Open Access 7

PDF: 1,023 downloads (no books)

Country book: 195 downloads (no books)

Database: 53 downloads, 272 views,

Gold Open Access 6

PDF: 2,930 downloads (no books)

Country book: 456 downloads

Database: 165 downloads, 1018 views.

GOA8: Week 14

Saturday, April 8th, 2023

While still low on energy, it’s coming back. This was a very strong week–but partly for poor reasons. I scanned 1,300 journals and completed the bulk of Indonesian universities–and Indonesia continues to have a serious malware problem. To wit: at the end of Week 13 there were 232 “xm” malware journals and 216 “xm2”–malware last year as well. Now, there are 299 “xm”–and an awful 351 xm2. But it could have been worse: One university with roughly 50 “xm” journals in a row last year (as is commonly the case, the infection is at a root level) apparently corrected the problem this year: All 50 are clean. So 351 is awful, but 401 would be even worse.

At the moment, there are 2,169 journals left to scan (last week’s figure was off by The Usual One: the last row is one high, because there’s a heading row). That suggests that the first scan should be finished around April 22 (cross fingers). But there are currently 2,275 journals left to recheck, which could take another two weeks (unless I opt not to recheck 405 xm2/xx2 journals, which I’m inclined to do). Actually, I’ll probably spend a week on cleanup and then recheck, to allow one more week for late-published 2022 issues to show up. (That’s the remaining 1,500-odd journals ignoring xm/xx).

The numbers

1,300 more journals checked.

The overall counts at this point:
16,600 journals checked, of which
14,584 published 1,311,198 articles in 2022 and
15,420 published 1,223,884 articles in 2021.

The rest of the numbers:

  • Fee versus diamond/no-fee: 5,241 journals with fees, 11,359 without.
  • New vs. continuing: 1,908 newly-added, 14,692 continuing (including all of the “x”status below).
  • Status code:
    14,555 “a”–clean.
    384 “bi”– inactive (no articles since at least 2020).
    69 “bx”–done but at a different URL.
    97 “xd”–defunct, no articles since at least 2016.
    299 “xm”–malware (but not last year).
    51 “xn”–not an OA journal (including those removed this year but before I got to them) and ones suddenly requiring a login.
    720 “xx”–unreachable or unworkable.
    And the two oddities:
    351 “xm2”–malware, also malware last year
    54 “xx2”–unreachable or unworkable, as was true last year.
  • Ease of article counting:
    “d” 8,768: easiest, taken directly from DOAJ (sometimes with 2022 count modified)
    “w” 962: easy, journal website provides direct numbers at either volume or issue level.
    “f”  5,004: middling; numbers calculated using Find function for constants (e.g. “doi.” or “pdf”)
    “c” 517: slowest; articles counted manually.
  • Why the counts of “ease of…” don’t add up to total journals counted: all xd and bi cases, not quite all other non-a cases. If I couldn’t count them at all…

And I’d still appreciate feedback on the Diamond OA idea. Anyone out there?

 

GOA8: Week 13

Saturday, April 1st, 2023

On one hand, the therapy’s done. On the other, fatigue really has hit me–at the moment, the gain in time and loss in energy about balance. That should change.

I’m still really looking for feedback on the Diamond OA idea (see here).

At the moment, there are 3,470 journals left to scan. That suggests that the first scan should be finished around April 22 (cross fingers). But there are about 2,000 journals left to recheck, which could take another two weeks (unless I opt not to recheck 500+ xx2/xm2 journals, which may be reasonable). Allow another week for normalizing data and a few days to add derived data, and with luck I’ll be ready to start on the book in late May.

The numbers

1,100 more journals checked.

The overall counts at this point:
15,300 journals checked, of which
13,523 published 1,282,430 articles in 2022 and
14,235 published 1,191,360 articles in 2021.

The rest of the numbers:

  • Fee versus diamond/no-fee: 4,840 journals with fees, 10,460 without.
  • New vs. continuing: 1,762 newly-added, 13,539 continuing (including all of the “x”status below).
  • Status code:
    13,585 “a”–clean.
    362 “bi”– inactive (no articles since at least 2020).
    65 “bx”–done but at a different URL.
    94 “xd”–defunct, no articles since at least 2016.
    232 “xm”–malware (but not last year).
    46 “xn”–not an OA journal (including those removed this year but before I got to them) and ones suddenly requiring a login.
    649 “xx”–unreachable or unworkable.
    And the two oddities:
    216 “xm2”–malware, also malware last year
    51 “xx2”–unreachable or unworkable, as was true last year.
  • Ease of article counting:
    “d” 8,212: easiest, taken directly from DOAJ (sometimes with 2022 count modified)
    “w” 950: easy, journal website provides direct numbers at either volume or issue level.
    “f”  4,482: middling; numbers calculated using Find function for constants (e.g. “doi.” or “pdf”)
    “c” 506: slowest; articles counted manually.
  • Why the counts of “ease of…” don’t add up to total journals counted: all xd and bi cases, not quite all other non-a cases. If I couldn’t count them at all…