Author Archive

Notes on comments

Friday, April 7th, 2017
  1. By default, comments are off (quite a few posts don’t really need commenting, and every post draws robospam). I don’t always remember to turn them on in cases where feedback is desirable.
  2. The spam software I formerly used allowed me to review all the spam, which I did. That software isn’t compatible with current WordPress. The software I’m using now does not show me spam, so it’s difficult to rescue a comment.
  3. The solution in both cases: send me email (waltcrawford@gmail.com), and if the comment is supposed to be attached to a post, say so: I’ll do that as appropriate.

Cites & Insights 17:3 (April 2017) available

Thursday, April 6th, 2017

Cites & Insights 17:3 (April 2017) is now available for downloading at http://citesandinsights.info/civ17i3.pdf

The 32-page 6″x9″ single-column issue* includes two essays:

The Art of the Beall   pp. 1-20

[Hat-tip to Phil Davis for the title.] The blacklists have “disappeared,” but not the blather. Almost entirely material from January 16, 2017 to April 3, 2017. And remember that a comprehensive study of journals that were on the lists and their article counts from 2012 through June 30, 2016 is available as C&I 17.1.

Libraries and Communities  pp. 21-32

If the first essay’s all recent material, this one’s not: items date from October 2009 to May 2014. Some thoughts on libraries and/in their communities, mostly by people better qualified to write about these things than I am

*Reminder: Cites & Insights is now optimized for online/tablet reading. If you’re printing it out, I recommend having your PDF software print as a booklet, which should require 8 sheets of paper. Very slightly smaller type, good paper efficiency.

The Problems with Shen/Björk’s “420,000”

Monday, April 3rd, 2017

[This is Chapter 4 of Gray OA 2012-016, a comprehensive study of journals on “the lists.”]

 

Cenyu Shen and Bo-Christer Björk published “‘Predatory’ open access: a longitudinal study of article volumes and market characteristics” in BMC Medicine 13, October 2015. (I’m bemused at the idea that this is a medical paper, but that’s a separate discussion.) I started questioning the paper’s conclusions as soon as it appeared, and continued to do so in my blog and in Cites & Insights.

Quite apart from the apparent assumption that Beall’s word is gospel when it comes to journals being “predatory”—an assumption I found, and find, appalling—I thought the numbers were implausible. The authors used a sample of 613 journals to assert that there were around 8,000 active “predatory” journals in 2014 and that those journals published around 420,000 articles in 2014 (up from around 310,000 in 2013 and 212,000 in 2012).

Being presented with a case for the implausibility of the numbers, the authors responded that the article was peer-reviewed and used proper statistical methods. As I was writing this, I took the time to read open reviewer comments on the article and the authors’ responses. Notably, all of the reviewers said they weren’t qualified to review the statistics—and there were certainly questions raised about the assumption that to be on Beall’s list was to be predatory.

The authors are right about one thing: looking at all the journals is a ridiculously large task. But that task showed that gray journals are just as heterogeneous as I thought they were, making it easy for a 6% sample to be wildly off base.

The First Cut

Now that I’ve done the work, the first note could be that the article’s 2014 figure has the first two digits reversed: it’s closer to 240,000 than to 420,000. Of course, the authors did not accidentally transpose digits; they came up with too-large results. Instead of 420,000 for 2014, 310,000 for 2013 and 212,000 for 2012, the figures should be 255,000 for 2014, 189,000 for 2013 and 125,000 for 2012 (rounding to the nearest thousand)—consistently between 59% and 61% of the article’s figures.

“255,000 questionable as compared to 560,000 DOAJ” isn’t as astonishing as “nearly as many predatory as not.” That 420,000 figure has been cited a lot, mostly by critics of open access in general.

But there’s more to say…

The Second Cut

The authors were working from an earlier and much smaller pair of Beall lists than those that I worked from. I used the Wayback Machine to download versions of the list as close as possible to the versions they used (in both cases, later and presumably a little larger). Flagging publisher and journal listings from those earlier versions yield the figures in Table 4.1, including “UA” journals but excluding X-coded ones.

2014

2013

2012

Journals

2,692

2,222

1,370

Articles

113,996

87,325

55,303

Table 4.1. Journals and articles based on Beall lists at time of Shen/ Björk article

Now we’re down from 8,000 active journals to 2,692—and from 420,000 articles to just under 114,000. The percentages are still clustered: now the real numbers are 26% to 28% of those reported in the article. Even if you added 50% to my figures to account for a few dozen not-fully-counted journals (rather than the 5% to 10% I consider plausible), you’d be nowhere near 200,000, let alone 420,000. And, of course, 114,000 is a pretty small fraction of 560,000—just over one-fifth.

Even those numbers involve the odd assumption that Beall’s tagging is definitive. What happens if we reduce the universe to those articles and publishers where Beall’s actually made a case?

The Final Cut

2014 2013 20*12
Journals

936

781

488

Articles

29,947

21,500

13,198

Table 4.2. Journals and articles where Beall made a case

Table 4.2 shows the results: fewer than 30,000 articles in 2014—about 7% of the article’s estimate. (The 2012 and 2013 figures are 6% to 7% of the article’s estimates.) These are cases where Beall not only listed a publisher or journal at the time the authors downloaded the lists, but actually made a case for the journals or publishers being questionable or “predatory.”

Those numbers are too low—but they’re arguably what should have emerged from the study. As noted in Chapter 3, I believe realistic numbers are on the order of 120,000 for 2014; 90,000 for 2013; and 56,000 for 2012—still a lot of articles appearing in questionable journals, but not quite so alarmingly high.

What Went Wrong?

How could these two scholars be so far off? First there’s the assertion that all journals on Beall’s lists are actually predatory. Second, the “stratified” random sampling method involves some tricky assumptions, based on a “suspicion” that was “verified” by sampling all of ten journals—the suspicion “that journals from small publishers often publish a much higher number of articles than those of large publishers.”

The sampling used in this study yielded a much lower percentage of empty journals than my 100% survey. The article estimates that 67% of listings represent active journals; my 100% survey (admittedly of a larger list) shows 40% active journals. That’s an enormous difference: instead of 8,000 active journals from the smaller list, you wind up with around 4,800. That’s probably about right (I show 5,988—but that’s from a much larger list).

Beyond that, it appears that the sheer heterogeneity of journals makes projection from a small sample so dicey as to be useless. Unfortunately, I believe that to be the case.

GOAJ: March update

Friday, March 31st, 2017

It’s March 31–the last day of the month, when I fetch usage statistics for my websites (as always, omitting about 6 hours of that last day), so here’s an update on GOAJ. (I might have stopped doing these, but the GOAJ download numbers are astonishing, so…):

  • Paperbacks: No change. Two copies of GOAJ itself sold. So far, none of the others.
  • Dataset: 25 more views, 1,059 total views; 4 more downloads, 456 total downloads.
  • GOAJ: one additional Lulu,  45 total Lulu copies, 4,066(!) more (total 18,689* copies from my site: total 18,734 (actual total almost certainly over 19,000). Here’s the thing: not only does that 4,066 figure represent more than 90% of all data (by bandwidth) from waltcrawford.name–it’s mostly from spiders and other robots, not from people directly downloading. The latter appears to represent perhaps 700-800 copies, still a lot.
  • Subjects: Oneadditional Lulu, 20 total Lulu copies, 43 additional, 375* other copies, 395* total.
  • Countries: No additional Lulu, 8 total Lulu copies, 242 additional, 1,587* total other copies, 1,595 total.
  • C&I: New totals 1,352* copies of the excerpted GOAJ version (16.5) and 4,154* copies of “APCLand and OAWorld” (16.4.)

*Missing downloads from 11/13-11/30/16 and, for C&I, 11/13-12/15/16.

Gray OA

Gray OA 2011-2016 (Cites & Insights 17.1) shows a total of 1,120 downloads to date, and no apparent recognition anywhere else that the Shen/Bjork “predatory articles” numbers are demonstrated to be so dramatically wrong; the dataset shows 228 views and 58 downloads.

Cites & Impasse: feedback desired

Friday, March 17th, 2017

In the most recent W.a.R. post, I said this:

In the meantime, other than various other stuff, there’s a possible Cites & Insights (if anybody cares–and based on recent readership levels, I’m not sure) and the question of following up on 3,300-odd journals that were in DOAJ on 1/1/16 but not on 1/1/17. And slowing down a bit.

I’m still unsure–and the title of this post, which started out as a typo, may be meaningful.

Here’s the numbers:

  • The January 2017 Cites & Insights, Gray OA 2012-2016: Open Access Journals Beyond DOAJ, shows 1,043 total downloads, but 975 were in 2016 and only 68 are in March 2017. I’d hoped that this study–which I wasted spent way too much time on–would get, say, one-fifth the readership of Gold Open Access Journals 2011-2015 and might have some small effect on the discussions regarding “predatory” journals. (I’d really hoped that somebody might acknowledge that the “420K 2014 articles in predatory journals” figure was provably wrong–but I keep seeing that figure repeated.) [Remarkably, GOAJ  2011-2015 has another 2,099 downloads in the first half of March 2017!]
  • The February 2017 Cites & Insights, a fairly ordinary issue, has a total of 408 downloads to date, but only 82 in March: not terrible, but not impressive.

Readership is way down–and so is my motivation to write the [March? April? May? Spring?] issue–but not just because of declining readership, and partly for one reason that I think may be related to declining readership. So I’m offering up a couple of possible reasons and asking for feedback. C&I isn’t entirely going away [yet], but could become a mostly-OA-supporting-material outlet. Or not.

1. Dystopia Fatigue: 45 for the Loss

The reason that is definitely reducing my interest in writing and may be reducing others’ interest in reading C&I is that so much mental and emotional energy is spent trying to cope with the dystopian situation that could be summed up as 45–not only an administration that appears set on making America a mean-spirited, post-science, pathetic nation relying on bloated armaments to push actual great nations around, but also the newly-empowered racists and bigots who seem to feel that it’s now American to loudly proclaim the shameful feelings they once tended to keep to themselves.

It is draining to read the news. It is worse than draining to read some of the reactions. It is draining to try to determine what (other than the usual PPFA, ACLU, AU etc. checks) to do about it–and whether drastic actions are warranted.

I can only assume that others also find it draining, and may not feel like reading secondary/apolitical stuff like C&I that isn’t actually good “escapist” reading. (I’m just over halfway through The Devil’s Brood: is that escapist?)

For British readers, there s a separate-but-related dystopian present going on.

It’s hard to argue with a lack of remaining energy. I will surely agree that real action that might help preserve what’s left of America’s greatness is a whole hell of a lot more important than reading (or writing) my stuff.

Now, getting off the soapbox:

2. Old, Repetitious and Largely Irrelevant

That’s the quick way of putting it.

I’m trying to do stuff that nobody else is doing, since I gladly affirm that younger, more energetic and probably brighter people can and should be doing the kinds of things I used to do. Without mentioning my age directly, I’ll note that our taxes for 2016 are heavily impacted by being required to either take certain payments starting last year or losing half of that money to the Feds.

The GOAJ studies are good examples of stuff nobody else is doing. I’d like to think that most C&I essays also fall into that category–but they may not be worth doing. As for repetitious and irrelevant…perhaps.

So…

[A few of you will wonder whether my continued lateral-nerve problem, being reduced to six-finger typing, is also a factor. No, the nerve still hasn’t recovered, and may or may not ever do so. But I managed to write all three booklength portions of GOAJ2011-2015 despite this problem, so while my typing continues to be much slower and less accurate than before March 2016, that’s not a major factor.]

  • Should I spend most of the “pause”–the next three or four weeks, before Phase 2 of the GOAJ2011-2016 research and then all the analysis and writeup–on revisiting the 3,000-odd “departed” journals for a supplemental chapter and just let C&I lie dormant? And use leftover time to catch up on reading…
  • Should I try to split the time between that revisit [which turns out to be reasonably fast because I’m only looking at 2016 availability and article counts, not APC levels] and doing a C&I issue? [Which would probably consist of one medium-length roundup on access & economics and one relatively brief roundup on the disappearing blacklists.]
  • Other suggestions?

Comments are open. I’m interested in your feedback.


Updated March 22, 2017:
I’m still looking for feedback of all sorts. If your comment doesn’t show up, it may be awaiting moderation or possibly deleted as spam–I’ve had to change spam control (from Spam Kismet 2, which no longer seems compatible, to WP-SpamShield), and I no longer see spam-trapped comments. You can always email me your comment (waltcrawford@gmail.com), if it doesn’t show up within a day of posting…if you note “Intended as a post comment” I’ll add it here.

GOAJ16: A pause in the process

Wednesday, March 15th, 2017

Yesterday, I completed the first pass of the data-gathering process for Gold Open Access Journals 2011-2016 — I’ve now visited or attempted to visit all 9,430 journals in DOAJ as of January 1, 2017.

Around 1,400 of those need to be revisited–either because there’s likely to be additional 2016 data or because there were problems of some sort. That process will start some time in mid-April.

In the meantime, other than various other stuff, there’s a possible Cites & Insights (if anybody cares–and based on recent readership levels, I’m not sure) and the question of following up on 3,300-odd journals that were in DOAJ on 1/1/16 but not on 1/1/17. And slowing down a bit.

I have no firm idea what the final numbers will be, but “around half a million” is a ballpark estimate. Note that most of the journals added to DOAJ in 2016 are *not* brand-new journals, so counts for previous years do change.

GOAJ: February 2017 update

Tuesday, February 28th, 2017

It’s February 28–the last day of the month, when I fetch usage statistics for my websites (as always, omitting about 6 hours of that last day), so here’s an update on GOAJ. (I might have stopped doing these, but the GOAJ download numbers are astonishing, so…):

  • Paperbacks: No change. Two copies of GOAJ itself sold. So far, none of the others.
  • Dataset: 30 more views, 1034 total views; 7 more downloads, 452 total downloads.
  • GOAJ: two additional Lulu,  44 total Lulu copies, 3,039(!) more (total 14,623* copies from my site: total 14,667 (actual total almost certainly over 15,000). I dunno: I find that 3,039 figure astonishing at this point.
  • Subjects: No additional Lulu, 19 total Lulu copies, 39 additional, 332* other copies, 351* total.
  • Countries: No additional Lulu, 8 total Lulu copies, 179 additional, 1,345* total other copies, 1,353 total.
  • C&I: New totals 1,298* copies of the excerpted GOAJ version (16.5) and 4,071* copies of “APCLand and OAWorld” (16.4.)

*Missing downloads from 11/13-11/30/16 and, for C&I, 11/13-12/15/16.

Gray OA

Gray OA 2011-2016 (Cites & Insights 17.1) shows a total of 975 downloads to date, and no apparent recognition anywhere else that the Shen/Bjork “predatory articles” numbers are demonstrated to be so dramatically wrong; the dataset shows 189 views and 43 downloads.

 

Halfway through: a quick note on GOAJ16

Thursday, February 9th, 2017

As of a few minutes ago I’m just over halfway through the initial data gathering pass for Gold Open Access Journals 2012-2016 (that date range may turn out to be 2011-2016 if I can figure out the formatting: I’m gathering/keeping the 2011 data).

That is: I’ve done 4,740 journals, and there are 4,710 left to do. (I do them 20 at a time–which can take anywhere from half an hour to two hours or more–so the precise halfway point wasn’t a good place to pause.)

A few tidbits on the first 4,740:

  • 784 of them will be checked again no earlier than April, either because there were problems or because they seemed likely to have more 2016 articles/issues posted a bit later in 2017.
  • 685 of those checked so far are new to the list–but only 50 of those actually started publishing in 2016.
  • 4,346 so far are “good”–tagged either A or B.
  • 36 are duplicates, either cases of slight changes in titles or two language versions or…: in each case, if I catch it. only one gets counted.
  • The rest have some problem–malware, unreachable, unusable, etc. All those will be revisited.
  • 2,621 do not charge APCs; 1,919 do have stated APCs.

Nothing profound. Just a 15-minute break before heading back to the process. I may or may not be halfway through in terms of time required: note that “half hour to two hours or more” range. At least I know the first 20 of the second half will be quick (the rest of Libertas Academica, now part of Sage: predictable layout and easy to count by year of publication).

Gold Open Access Journals: January update

Tuesday, January 31st, 2017

It’s January 31–the last day of the month, when I fetch usage statistics for my websites (as always, omitting about 6 hours of that last day), so here’s an update on GOAJ:

  • Paperbacks: No change. Two copies of GOAJ itself sold. So far, none of the others.
  • Dataset: 26 more views, 1004 total views; 5 more downloads, 445 total downloads.
  • GOAJ: no additional Lulu,  42 total Lulu copies, 2,422(!) more (total 11,584* copies from my site: total 11,626 (actual total almost certainly over 12,000).
  • Subjects: No additional Lulu, 19 total Lulu copies, 82 additional, 293* other copies, 312* total.
  • Countries: No additional Lulu, 8 total Lulu copies, 112 additional, 1,166* total other copies, 1,174 total.
  • C&I: New totals 1,223* copies of the excerpted GOAJ version (16.5) and 3,999* copies of “APCLand and OAWorld” (16.4.)

*Missing downloads from 11/13-11/30/16 and, for C&I, 11/13-12/15/16.

Gray OA and the state of C&I

Gray OA 2011-2016 (Cites & Insights 17.1) shows a total of 818 downloads to date, and no apparent recognition anywhere else that the Shen/Bjork “predatory articles” numbers are demonstrated to be so dramatically wrong; the dataset shows 129 views and only 19 downloads. I’d already concluded that it was crazy to consider updating the study (which probably involved more work than GOAJ); the lack of interest confirms that conclusion–and, of course, the source material’s disappeared in any case.

As for C&I and the balance between new issues of that and work on the second edition of GOAJ (2012-2016, or maybe 2011-2016) can’t help but be swayed by the figures for C&I 17.2: 166 total to date. Issue 17.3 will emerge……..eventually.

Missing those lists? Never fear…

Tuesday, January 17th, 2017

It appears that the content in Beall’s blog disappeared a few days ago, including the notorious lists of ppppredatory publishers and journals.

I have no inside information as to what happened.

Here’s the thing, though:

In addition to the usual Internet Archive approach to finding slightly earlier versions of the lists, I can recommend the following–with the caveat that I regard the lists as useless and damaging as “blacklists” but useful as a broad directory of gray/gold OA (gold OA not in DOAJ):

  • There’s a spreadsheet including all the journals from both lists as of July 8, 2016–including publishers, journals, URLs, but also article counts for each journal for 2012, 2013, 2014, 2015, and the first half of 2016, as well as the current APC (as of late 2016) and my status code for each journal
  • Gray OA 2012-2016: Open Access Journals Beyond DOAJ, the January 2017 Cites & Insights, provides full analysis of this universe and how it meshes with the larger DOAJ universe, and even a breakdown of the vastly inflated “predatory” numbers in one piece of published research.

Both free, both CC-BY; the first is the master dataset for the second. Neither has been seen by all that many people, which is sort of a shame.