Archive for the 'open access' Category

Open Access Journals: New Grade Summary

Posted in open access on February 4th, 2015

As noted in the current Cites & Insights, I’ve moved 580 DOAJ journals from Grade B to Grade A$ because the only reason to regard them as possibly requiring investigation is that they have APCs of $1,000 or more. That’s something to be aware of, and the justifications for high APCs still need discussion, but if an author has the money and finds the APC reasonable, there’s nothing else about these journals to raise concerns.

The Library Technology Reports issue this summer will reflect that change, but none of the existing C&I coverage does.

Here’s a table (that probably won’t appear in this form in the report) that shows the number of journals and 2013 articles in each grade, as revised.

Grade Desc. Journals %J Articles %A
A Apparently good





A$ Apparently good with high APC





B May need investigation





C Highly questionable





DC Ceased





DD Dying





DE Erratic





DH Hiatus?





DN New?





DS Small





E Empty



EC Empty/cancelled



N Not OA



O Opaque



X Unreachable or unworkable






If you draw the conclusion from this table that journals with high APCs publish a lot of articles, you wouldn’t be wrong.

Do we need OA megajournals in humanities & social sciences?

Posted in open access on December 29th, 2014

I can’t answer that question, of course. I can offer some factual input.

I’ve now looked at all of the journals in the Directory of Open Access Journals (as of May 2014) that have enough English in their interface for me to be able to (a) determine whether the journal charges article processing fees (or submission fees or whatever) and, if so, how much those fees amount to, (b) determine that they are in fact publishing refereed scholarly articles and (c) determine how many such articles they’ve published in 2011, 2012, 2013 and the first half of 2014.

That caveat is because somewhere north of 2,000 journals in DOAJ either didn’t have English or Eng as one of the languages in their DOAJ record or, when I went there, did not have enough English for me to be able to do those things. So I’ve only looked at 7,301 DOAJ journals (plus another 6,949 “Beall journals”–most of them not actually journals–that weren’t in DOAJ at that point and another 401 OASPA-member journals that weren’t in DOAJ, in many cases because they’d ceased publishing).

Within those 7,301 journals, here’s, briefly, what I found for humanities & social sciences, omitting the few journals with unknown/unstated APCs–there are a dozen such journals in this group):

Humanities alone

(OK, so my definition of humanities may not be the same as yours, but set that aside…)

  • Journals with APCs that published some articles between 2011 and June 30, 2014: 38 journals, publishing around 1,750 articles in the first half of 2014, around 3,200 in 2013, around 2,800 in 2012 and around 2,150 in 2011. (Median APC: $300.)
  • Journals with no APCs–free on both sides–that published some articles between 2011 and June 30, 2014: 745 journals, publishing around 5,850 articles in the first half of 2014, around 12,700 in 2013, around 12,850 in 2012, and around 11,400 in 2011.
  • That adds up to around 15,900 articles in 2013 and around 15,600 in 2012; the 2014 numbers may be slightly lower, but a lot of these journals only post issues once a year, so it’s too early to say.

Humanities and social sciences (which includes all of the above)

  • Journals with APCs (as above): 270 journals, publishing around 8,200 articles in the first half of 2014, around 14,500 in 2013, around 13,500 in 2012 and around 10,200 in 2011. (Median APC $203.)
  • Journals without APCs (free on both sides): 1,930 journals, publishing around 16,100 articles in the first half of 2014, around 37,700 in 2013 and the same in 2012; around 33,650 in 2011.
  • That adds up to around 52,000 articles in 2013 and around 51,200 in 2012.

So I guess the question is: are there tens of thousands of worthwhile articles out there that aren’t getting published because there aren’t enough good OA journals in HSS? Note that the average no-fee humanities journal only publishes about 17 articles a year; if each one added four more articles–probably not an overwhelming addition to the presumably-volunteer editors’ workloads–that would take care of another 3,000-odd articles.

I’m not part of the academy or The Academy. I don’t know what’s actually needed. I am a little suspicious of grand schemes…but that’s just me.

If you’re wondering: I will have a some summary figures and notes on the completion of this absurdly large investigation in the March 2015 Cites & Insights, out some time in February 2015; a thoughtful, edited, complete, coherent view (with advice for librarians) will appear in the summer from a publisher I regard as highly reputable, but it will carry a price.

Comments are open on this post.

Two weeks in: a quick update

Posted in Cites & Insights, open access on December 16th, 2014

Cites & Insights 15.1, January 2015, was published two weeks ago, featuring the “third half” of my vast-but-incomplete survey of gold OA in 2011-2014, along with some additional notes related to gold OA.

“Going for the gold: OA journals in 2014: any interest?”–asking whether a coherent, well-organized look at the overall state of OA journals in 2014 (or, really, 2011-2014), based on an even larger survey of the journals, done as a paperback book, would be of any interest–appeared the next day, December 3, 2014. Essentially the same text appeared as one of the shorter pieces in the “third half” essay.

As of this morning (at 5 a.m., when the daily statistics run for month-to-day happens), December 16, 2014, C&I 15.1 is doing OK in terms of readership: 1,355 downloads to date (1,168 of the print-oriented two-column version, 187 of the 6×9″ single-column version). Those are strong numbers; I’d like to think the issue’s having some mild impact.

As of this morning, total non-spam responses to the other post (and to the piece in C&I) are a little less strong. 1,355 less strong, to be exact. (Lots of spamments, but that happens any time I turn comments on.)

That’s a shame, but it’s also reality.

Meanwhile, I’m now a little more than halfway in scanning the remaining 2,200-odd journals, which are now down to 1,800-odd as I remove journals where there’s not enough English in the interface for me to determine whether they have article processing charges and how their issue archives work. That is: I have 1,010 journals that I’ve been able to record information on, with 800-odd to go, but I imagine another 100+ will disappear in that process.

A word to OA publishers who are trying to offer an English interface without actually doing any work: Having an English flag (either literally a flag or a pull-down list option) is really sort of pointless if all it does is change the OJS menu headings to English, with all the text linked from them still in the primary language of the journal. Cute, but pointless.

But at least better than the journals hosting malware…and I think I have one of you to “thank” for spending most of a day last week recovering from a nasty little Trojan disguised as a Flash update. I saw a second attempt this week, but the combination of anti-crap software I’m running flagged it immediately.

Oh, just as a sidebar, here are some year-to-November-30* figures for OA-related essays in Volume 14:

  • April 2014, 14:4 (The Sad Case of Jeffrey Beall and another essay): 2,781 two-column plus 3,393 single-column (a rare case in which the single-column outdid the two-column), for a total of 6,174, a big number for C&I: by far the largest 2014 download count for any issue of C&I (that’s out of some 176,000 total downloads through November 30, although as noted in the footnote below that’s missing 11 days, the last day of each month).
  • May 2014, 14:5 (The So-Called Sting and another essay): 1,690 two-column plus 1,283 single-column, for a total of 2,973, also a very good number.
  • July 2014, 14:7 (Journals, “Journals” and Wannabes): 1,839 two-column plus 1,042 single column, for a total of 2,881, which is very good, especially noting that the window is getting smaller.
  • October/November 2014, 14:10 (Journals and “Journals”: Taking a Deeper Look): 817 two-column plus 239 single-column for a total of 1,056. Not bad for a relatively brief period.
  • December 2014, 14:11 (Journals and “Journals” Part 2): 998 two-column plus 456 single-column, for a total of 1,454, which is pretty good given that it came out on November 2, so that’s one month’s readership.

The three Journals and “Journals” issues show 96, 27, and 88 additional downloads for December 1-15, respectively.

*Technically, November 29: because of how the statistics run, I never actually see the figures for the final day of a given month.

Update December 18, 2014: Comments now turned off. The question of whether or not to write a Publish-on-Demand paperback based on all of this has been rendered moot, in a way that will serve libraries quite well, I believe.

Going for the Gold: OA Journals in 2014: any interest?

Posted in C&I Books, open access on December 3rd, 2014

[Adapted and slightly updated from the January 2015 C&I, partly so you can comment directly at the end.]

I’m toying with the idea of doing an updated, expanded, coherent version of Journals and “Journals”: A Look at Gold OA. Current working title: Going for the Gold: OA Journals in 2014.

The book would use a very large subset of DOAJ as it existed in May 2014 as the basis for examining gold OA—with sidebars for the rest of Beall (most of which is “journals” rather than journals) and the rest of OASPA (which doesn’t amount to much). It would assume a four-part model for some of the discussion (megajournals, bio/med, STEM other than biology, and HSS).

But it would also add even more DOAJ journals, drawn from around 2,200 that have English as one language but not the first one (and a few hundred that were somehow missed in the latest pass). Based on a sampling of 200-300 or so, I’d guess that this would yield 500 to 1,000 more journals (that are reachable, actually OA, and have enough English for me to verify the APC, if any, verify that it’s actually peer-reviewed scholarship, and cope with the archives), possibly fewer, possibly more.

Update: At this point, I’ve recorded information for 200—well, 199—additional journals, but in the process I see that the last row in the spreadsheet has gone from something over 2,200 to a current 2,107, as I delete journals where there isn’t enough English available for me to determine the APC or that there isn’t one, determine that the journal appears to be scholarly research articles, and navigate the archives. Since close to 30% of the 200 journals are either unreachable, aren’t OA as I’m defining it, or are set up so that I find it impossible to count the number of articles, that suggests—and suggests is the right word—that I might get something like 1,400 journals of which something like 1,000 provide useful additional information. But journals are wildly heterogeneous: the actual numbers could be anywhere from 250 to 1,900 or so. Best guess: around 800-1,200 useful additions.

There would still be a portion of DOAJ as of May 2014 not included: journals that don’t include English as one of their possible languages and those that don’t have enough English for a monolingual person to make sense of them. That group includes at least 1,800 journals.

The paperback might also include the three existing pieces of Journals and “Journals,” depending on the length and final nature of the new portion. If so, the old material would follow the new. The paperback would cost $45 (I think), and a PDF ebook would be the same price.

Update: More likely, the paperback would not include the three existing pieces but would add some additional analysis—e.g., proportion of free and APC-charging journals by country of origin.

Since curiosity hasn’t quite killed me off yet, I may do this in any case, but it would be a lot more likely if I thought that a few people (or libraries or institutions or groups involved with OA) would actually buy it. If you’re interested—without making a commitment—drop me a line at saying so (or leave a comment on this post).

Of course, if some group wanted this to be freely available in electronic form, I’d be delighted, for the price of one PLOS One accepted article without waivers: $1,350. With that funding, I’d also reduce the paperback price to Lulu production cost plus $2.

If some group was really interested in an updated look at all this—including full-year 2014 numbers for DOAJ and the rest of OASPA (but not the rest of Beall: life really is too short)—I’d be willing to consider doing that, which would be a lot more work, possibly for, say, the amount of the APC for Cell Reports: $5,000. I don’t plan to hold my breath for either offer, although the first doesn’t seem entirely out of the question.

You know where to find me.

[Updated 9:35 a.m.: Comments turned on. Oops.]

Updated December 18, 2014: Comments turned off again. This possibility–a print-on-demand self-published paperback based on all of this research–has been rendered moot by developments. There will, in fact, be a coherent overview with additional material, available some time in 2015, aimed at library needs. It will not be a Cites & Insights Book.

Cites & Insights 15:1 (January 2015) available

Posted in Cites & Insights, open access on December 2nd, 2014

The January 2015 issue of Cites & Insights (15:1) is now available for downloading at

The print-oriented two-column version is 28 pages long.

If you’re reading online or on an e-device, you may prefer the single-column 6″x9″ version, which is 57 pages long.

The issue includes:

Intersections: The Third Half    pp. 1-21

Most of this essay (pp. 7-19) is the “Third Half” of the two-part Journals and “Journals” examination in the October/November and December 2014 issues–adding another 1,200-odd bio/med journals from DOAJ and looking at overall patterns. The essay also includes four briefer discussions related to DOAJ and gold OA journals.

The Back   pp. 21-28

A baker’s dozen of sometimes-snarky mini-essays.


Announcing C&I Volume 14, the paperback version (with bonuses!)

Posted in C&I Books, Cites & Insights, open access on November 28th, 2014

ci14fc300The paperback annual Cites & Insights 14 (2014) is now available for purchase at

The 344-page 8.5×11″ trade paperback (printed on 60# white paper) includes all eleven issues of Cites & Insights 14 and a table of contents. It also includes three exclusive bonuses:

  • An index (actually two indexes, one for articles quoted in the volume, the other for names, topics and the like.
  • A wraparound color cover.
  • To complete the Journals and “Journals” series, an essay that will also appear as the first 20+ pages of the January 2015 Cites & Insights (to be published some time in December 2014).

While Volume 14 includes several essays related to ebooks (and print books, libraries, textbooks), magazines, futurism (in general and as applied to libraries) and more, the obvious focus of much of the year was open access–specifically, a series on access and ethics and a major series of all original research on Journals and “Journals,” looking at the nature of gold OA journals in 2011-2014 through actual examination of the websites of more than ten thousand journals and “journals” (the latter being things called journals that have never actually published any articles).

The paperback sells for $45 (as do all C&I Annuals), and helps to support C&I.

About that partial essay…

Posted in Cites & Insights, open access on November 20th, 2014

In “The Size of the Open Access Market (and an admission)” I said that the January 2015 issue would include a cleaned-up version of that post, some stuff that was originally supposed to be part of the December 2014 issue–and a partial completion of the DOAJ set, looking at the 1,200+ biology and medicine journals.

The full completion was planned as a special edition only appearing in the bound PoD paperback C&I Annual for 2014–and possibly as part of a separate book on Journals and “Journals.”

There’s a change, as noted in the second postscript to that post: I’ve given up on the “special edition” idea and have now included the full “third half” of the Journals and “Journals” Second Look in the January 2015 issue. Which will arrive, I don’t know, sometime before January 1, 2015.

A separate book? Still up in the air.

The Size of the Open Access Market (and an admission)

Posted in Cites & Insights, open access on November 14th, 2014

On October 29, 2014, Joseph Esposito posted “The Size of the Open Access Market” at the scholarly kitchen. In it, he discusses a Simba Information report, “Open Access Journal Publishing 2014-2017.” (I’m not copying the link because it’s just to the blurb page, not to any of the info that Esposito provides.) The 61-page Simba report costs a cool $2,500 (and up), so I can’t give you any detail on the report itself other than what Esposito passes along.

The key portion of what he passes along, quoting Esposito directly:

Simba notes that the primary form of monetization for OA journals is the article processing charge or APC. In 2013 these fees came to about $242.2 million out of a total STM journals market of $10.5 billion. I thought that latter figure was a bit high, and I’m never sure when people are quoting figures for STM alone or for all journals; but even so, if the number for the total market is high, it’s not far off.  That means that OA is approximately 2.3% of the total journals market (or is that just STM . . . ?)….

And, quoting from one of the comments (it’s a fascinating comment stream, including some comments that made me want to scream, but…):

If those numbers are roughly right, then 2.3% of the scholarly publishing revenue equates to something like 22% of all published papers.

That comment is by Mike Taylor, who’s active in this comment stream.

I had no idea whether the Simba numbers made any sense and what magic Simba performed to get numbers from the more than two thousand Gold OA publishers (my own casual estimate based on DOAJ publisher names), but hey, that’s why Simba can get $2,500 for 61 pages…

The admission

There turned out to be a mistake or, if you will, a lie in the December 2014 Cites & Insights, on the very last page, top of the second column, the parenthetical comment. When I wrote that, I fully intended to sample perhaps 10%-20% of the 1,200+ bio/biomed/medical DOAJ journals not in the OASPA or Beall sets to get a sense of what they were like…

…and in the process realized what I should already have known: the journals are far to heterogeneous for sampling to mean much of anything. And, once I’d whittled things down, 1,200+ wasn’t all that bad. Long story short: I just finished looking at those journals (in the end, 1,211 of them–of the original 1,222, a few disappeared either because they turned out to be ones already studied or, more frequently, because there was not enough English in the interface for me to look at them sensibly).

Which means that I’ve now checked–as in visited and recorded key figures from–essentially all of the DOAJ journals (as of May 7, 2014) that have English as the first language code, in addition to some thousands of Beall-set journals and hundreds of OASPA journals that weren’t in DOAJ at that point.

Which means that I could do some very rough estimates of what a very large portion of the Gold OA journal field actually looks like.

Which means I could, gasp, second-guess Simba. Sort of. For $0 rather than $2,500.


The numbers I’m about to provide are based on my own checking of some absurdly large number of supposed Gold OA journals, yielding 9,026 journals that actually published articles between January 1, 2011 and June 30, 2014. The following caveats (and maybe more) apply:

  • A few thousand Gold OA journals in DOAJ that did not have English as the first language code in the downloaded database aren’t here. Neither are some number that did have English as the first language code but did not, in fact, have enough English in the interface for me to check them properly.
  • So-called “hybrid” OA journals aren’t here. Period.
  • Journals that appeared to be conference proceedings were omitted, as were journals that require readers to register in order to read papers, journals that impose embargoes, journals that don’t appear to have scholarly research papers and a few similar categories.
  • Some number of journals aren’t included because I was unable or unwilling to jump through enough hoops to actually count the number of articles. (See the October/November and December issues for more details; including the additional DOAJ bio/biomed/medical set, it comes to about 560 journals in all, most of them in the Beall set.)
  • I used a variety of shortcuts for some of the article counts, as discussed in the earlier essays.
  • Maximum potential revenue numbers are based on the assumptions that (a) all counted articles are in the original-article category, (b) there were no waivers of any sort, (c) the APC stated in the summer of 2014 is the APC in use at all times.

All of which means: while these numbers are approximate–the potential revenue figures more so than the article-count figures, I think, since quite a few fee-charging journals automatically reduce APCs for developing nations (as one example). On the other hand, some of the differences mean that I’m likely to be undercounting (the first four bullets) while the last bullet certainly means I’m overstating. Do they balance out? Who knows?

Second-guessing Simba

OK, here it goes:

Given all those caveats, I come up with the following for 2013:

  • Maximum revenue for Gold OA journals with no waivers: $249.9* million
  • Approximate number of articles published: 403* thousand

And, just for fun, here’s what I show for 2012:

  • Maximum revenue for Gold OA journals with no waivers: $200.2 million
  • Approximate number of articles published: 331 thousand

Here’s what’s remarkable: that maximum revenue of $249.9 million, which is almost certainly too high but which also leaves out “hybrid” journals and a bunch of others, is, well, all of 3.2% higher than Simba’s number.

Which I find astonishingly close, especially given the factors and number of players involved (and Simba’s presumed access to inside information, which I wholly lack).

(The 22% of all published papers? Close enough…although it should be noted that 403 thousand includes humanities and social sciences.)

Incidentally, 33 journals account for the first $100 million of that 2013 figure, including one that’s in the social sciences if you consider psychology to be a social science. Not to take away too much from what will appear elsewhere eventually, but if you sort by three major lumps, you get this:

  • Science, technology, engineering and mathematics (excluding bio/biomed/medicine): $66.0 million maximum potential revenue in 2013 for 170 thousand articles; $54.3 million maximum in 2012 for 138 thousand articles. Around 3,500 journals.
  • Biology and medicine: $174.5 million maximum potential revenue in 2013 for 180 thousand articles; $139.0 million maximum in 2012 for 150 thousand articles. Around 3,100 journals.
  • Humanities and social sciences (including psychology): $9.4 million maximum potential revenue in 2013 for 55 thousand articles; $6.9 million maximum in 2012 for 45 thousand articles. Around 2,400 journals.

Those are very raw approximate numbers, but I’d guess the overall ratios are about right. The gold rush is in bio/biomed/medicine: is anybody surprised?

What’s coming

I probably shouldn’t post this at all, since it weakens the January 2015 Cites & Insights, but what the heck…

In any case, now that I’ve looked at the 1,200+ additional journals, I will, of course, discuss those numbers.

(Credit to the late great Tom Magliozzi) The third half of the Journals and “Journals” deeper look will appear in part in the January 2015 Cites & Insights, out some time in December 2014 (Gaia willing and the creeks don’t rise).

That third half will be part of a multipart Intersections essay that also offers a few comments on the current DOAJ criteria (a handful of nits with a whole lot of praise) and considers the possibility that there’s a (dis)economy of scale in Gold OA publishing.

“In part”? Well, yes. I’ll do a discussion of the bio/med DOAJ subset that’s comparable to what I did for the other three sets of Gold OA journals, and I might include a few overall numbers. [See second postscript]

But there may be some more extended discussion of the overall numbers and how they break down (and maybe what they mean?), and that discussion might appear as a special section in the 2014 Cites & Insights Annual paperback, offering added value for the many (OK, maybe one so far) who purchase these paperbacks. It’s also possible that a complete retelling of this story will come out as a print on demand book, one that most definitely won’t be free, if I think there’s enough to add value. [See second postscript]

(Projections? I don’t do projections. I can say that, if the second half of 2014 equals the first half, there would be about 12% more Gold OA articles this year than last. I believe the Great OA Gold Rush of 2011-2013 is settling down…and that’s probably a good thing.)

Postscript, noon PST: I’ve enabled comments. I post so rarely these days that I’d forgotten that they’re now off by default.

Postscript, November 20, 2014:
After writing the abbreviated discussion (not that abbreviated: 14.5 C&I pages) and the full version, and letting it sit for a day or two, I’ve concluded that the full version doesn’t really add enough value for me to make a serious case that people should spend $45 for the paperback C&I Annual if they wouldn’t buy it otherwise. I think the Annuals are great and worth the money, but it’s pretty clear nobody else does.

So the full version–19 pages in the two-column format–will be the primary essay (or set of related essays) in the January 2015 volume, and the 2014 Annual will only add a wraparound cover and an index to the contents of the eleven 2014 issues. I’ve added strikeouts to the text above as appropriate.

As for a possible PoD book on Journals and “Journals”: still thinking about it.

*Additional postscript, December 27, 2014:

I’ve now gone through the rest of the DOAJ entries that offer English as one language possibility–another 2,200-odd, of which around 1,500 actually offered enough English for me to make sense of them. I’ve also gone through DOAJ itself for journals where I found it difficult to count articles directly (e.g., undated archives or archives consisting of whole-issue PDFs).

The bottom-line counts for articles and possible revenue for 2013 now come out to around 448,000 articles and around $261 million. Of that, around 366,000 and $231 million are from journals in DOAJ; Beall journals that aren’t in DOAJ–theoretically a larger number of journals, actually not–account for another 76,000 articles in 2013 (around 21% of DOAJ’s numbers) and around $22 million in potential revenue (around 9% of DOAJ numbers). The few hundred OASPA journals that aren’t in DOAJ account for fewer than 6,000 articles (less than 2% of DOAJ) and around $9 million (4% of DOAJ).

Some additional figures may appear in the March 2015 Cites & Insights; a coherent writeup of the whole OA journal scene (2011 through the first half of 2014)–or at least the very large portion of it I could investigate, essentially everything except 2,000-odd DOAJ journals that do not provide any form of English access–will appear next summer. More details later.

Open Data, Crowdsourcing, Independent Research and Misgivings

Posted in Cites & Insights, open access on September 1st, 2014

or Why Some Spreadsheets Probably Won’t Become Public

If you think that title is a mouthful, here’s the real title:

Why I’m exceedingly unlikely to make the spreadsheet(s) for my OA journals investigations public, and why I believe it’s reasonable not to do so.

For those of you on Friendfeed, there was a discussion on specifically this issue beginning August 26, 2014. The discussion was inconclusive (not surprisingly, partly because I was being a stubborn old goat), and I continued to think about the issues…even as I continued to build the new spreadsheet(s) for the project I hope to publish in the November and December 2014 Cites & Insights, if all goes well, cross several fingers and toes.

Consider this a public rethinking. Comments are most definitely open for this post (if I didn’t check the box, let me know and I’ll fix it), or you’re welcome to send me email, start a new thread on one of the social media I frequent (for this topic, Friendfeed or the OA community within Google+ seem most plausible), whatever…

Starting point: open data is generally a good idea

There may be some legitimate arguments against open datasets in general, but I’m not planning to make them here. And as you know (I suspect), I’m generally a supporter of open access; otherwise, I wouldn’t be spending hundreds of unpaid hours doing these investigations and writing them up.

All else being equal, I think I’d probably make the spreadsheet(s) available. I’ve done that in the past (the liblog projects, at least some of them).

But all else is rarely equal.

For example:

  • If a medical researcher released the dataset for a clinical trial in a manner that made it possible to determine the identities of the patients, even indirectly, that would be at best a bad thing and more likely actionable malpractice. Such datasets must be thoroughly scrubbed of identifying data before being released.

But of course, the spreadsheets behind Journals, “Journals” and Wannabes: Investigating The List have nothing to do with clinical trials; the explicitly named rows are journals, not people.

That will also be true of the larger spreadsheets in The Current Project.

How much larger? The primary worksheets in the previous project have, respectively, 9,219 [Beall’s Lists] and 1,531 [OASPA] data rows. The new spreadsheets will have somewhere around 6,779 [the subset of Beall’s Lists that was worth rechecking, but not including MDPI journals], exactly 1,378 [the subset of OASPA journals I rechecked, including MDPI journals], and probably slightly fewer than 3,386 [the new “control group,” consisting of non-medicine/non-biology/non-biomed journals in DOAJ that have enough English in the interface for me to analyze them and that aren’t in one of the other sets] rows—a total of somewhere around 11,543. But I’m checking them more deeply; it feels like a much bigger project.

So what’s the problem?

The spreadsheets I’ve built or am building are designed to allow me to look at patterns and counts.

They are not designed for “naming and shaming,” calling out specific journals in any way.

Yes, I did point out a few specific publishers in the July article, but only by quoting portions of their home pages. It was mostly cheap humor. I don’t plan to do it in the new project—especially since most of the journals in the new control group are from institutions with only one or a handful of journals; I think there are some 2,200 publisher names for 3,386 journals.

This is an important point: The July study did not name individual journals and say “stay away from this one, but this one’s OK.” Neither will the November/December study. That’s not something I’m interested in doing on a journal-by-journal or publisher-by-publisher basis. I lack the omniscience and universal subject expertise to even begin to consider such a task. (I question that anybody has such omniscience and expertise; I know that I don’t.) I offered possible approaches to drawing your own judgment, but that’s about it.

Nor do I much want to be the subject of “reanalysis” with regard to the grades I assigned. (I don’t want angry publishers emailing me saying “You gave us a C! We’re going to sue you!” either—such suits may be idiotic, but I don’t need the tsuris.)

Releasing the full spreadsheets would be doing something I explicitly do not want to do: spreading a new set of journal grades. There is no Crawford’s List, and there won’t be one.

For that matter, I’m not sure I much want to see my numbers revalidated: for both projects, I use approximation in some cases, on the basis that approximation will yield good results for the kind of analysis I’m doing. (I’ll explain most of the approximation and shortcuts when I write the articles; I try to be as transparent as possible about methodology.)

For those reasons and others, I would not be willing to release the raw spreadsheets.

Could you randomize or redact the spreadsheets to eliminate these problems?

Well, yes, I could—but (a) that’s more unpaid labor and, more important, (b) I’m not sure the results would be worth much.

Here, for example, are the data label rows and one (modified) data row from part of the current project:

Pub Journal 2014 2013 2012 2011 Start Peak Sum Gr GrF APC Note
pos POS Physics 15 34 14 1 2011 34 64 B $600

The columns, respectively, show: the publisher code (in this case, Pacific Open Science, a nonexistent—I think—publisher I may use to offer hypothetical examples in the discussion. Their slogan: If an article is in our journals, it’s a POS!); the journal name; the number of articles in January-June 2014, all of 2013, all of 2012, all of 2011; the starting year; the peak annual articles; the sum of the four years; the letter grade; a new “GrF”—the letter grade that journals with fewer than 20 articles per year would get if they had more; the article processing charge for a 10-page article; and any note I feel is needed. (If this was the new DOAJ control group, there would be another column, because hyperlinks were stored separately in DOAJ’s spreadsheet; for the one I chose, “POS Physics” is itself a hyperlink—but, of course, there’s no such journal. Don’t try to guess—the actual journal’s not remotely related to physics.)

I’ll probably add a column or two during analysis—e.g., the maximum annual APCs a given journal could have collected, in this case 34×600 or $20,400, and for the new DOAJ group the subject entry to do some further breakdowns.

I could certainly randomize/redact this spreadsheet in such a way that it could be fully re-analyzed—that is, sort the rows on some combination that yields a semi-random output, delete the Pub column, and change the Journal column to a serial number equal to the row. Recipients would have all the data—but not the journal or publisher names. That wouldn’t even take very long (I’d guess ten minutes on a bad day).

Would anybody actually want a spreadsheet like that? Really?

Alternatively, I could delete the Gr and GrF columns and leave the others—but the fact is, people will arrive at slightly different article counts in some significant percentage of cases, depending on how they define “article” and whether they take shortcuts. I don’t believe most journals would be off by more than a few percentage points (and it’s mostly an issue for journals with lots of articles), but that would still be troublesome.

Or, of course, I could delete all the columns except the first two—but in the case of DOAJ, anyone wanting to do that research can download the full spreadsheet directly. If I was adding any value at all, it would be in expanding Beall’s publisher entries.

What am I missing, and do you have great counter-arguments?

As you’ll see in the Friendfeed discussion, I got a little panicky about some potential Moral Imperative to release these spreadsheets—panicky enough that I pondered shutting down the new project, even though I was already about two-thirds of the way through. If I had had these requests when I began the project or was, say, less than 2,000 rows into it, I might have just shut it down to avoid the issues.

At this point, I believe I’m justified in not wanting to release the spreadsheets. I will not do so without some level of randomizing or redaction, and I don’t believe that redacted spreadsheets would be useful to anybody else.

But there are the questions above. Responses explicitly invited.

[Caveat: I wrote this in the Blog Post portion of Word, but it’s barely been edited at all. It’s probably very rough. A slightly revised version may—or may not—appear in the October 2014 Cites & Insights. If there is an October 2014 Cites & Insights.]

Now, back to the spreadsheets and looking at journals, ten at a time…

Added September 3, 2014:

Two people have asked–in different ways–whether I’d be willing to release a spreadsheet including only the journal names (and publishers) and, possibly, URLs.

Easy answer: Yes, if anybody thought it was worthwhile.

There are three possible sheets:

  • The Beall list, with publishers and the publisher codes I assigned on one page, the journals (with “xxind” as a publisher code for Beall’s separate journal list) and publisher codes on another page. All (I believe) publisher names and most but not all journal names have hyperlinks. (Some publishers didn’t have hyperlinked lists I could figure out how to download.) That one might be mildly useful as an expansion of Beall’s publisher list. (This would be the original Beall list, including MDPI, not the new one I’m using for the new study.)
  • The OASPA list, similarly structured and same comments, lacking MDPI (which is in the new one I’m using for the new study).
  • The new “partial DOAJ” list–DOAJ entries that aren’t in medicine, biology or biomed, that have English as a language code and that aren’t–if I got it right–in the other lists. I don’t honestly see how this could save anybody any time, since all it is is a portion of what’s downloadable directly from DOAJ, albeit in May 2014 rather than now.

If someone wants one of these, let me know– I may not respond immediately, but I’ll either return the sheet you want as an email attachment or, if there’s more than one request, possibly load it at or in Dropbox and send you a link.



Natureally, I’m delighted

Posted in Cites & Insights, open access on August 6th, 2014

My name appeared in a Nature news article today (August 6, 2014). Specifically:

The DOAJ, which receives around 600,000 page views a month, according to Bjørnshauge, is already supposed to be filtered for quality. But a study by Walt Crawford, a retired library systems analyst in Livermore, California, last month (see found that the DOAJ currently includes some 900 titles that are mentioned in a blacklist of 9,200 potential predatory journals compiled by librarian Jeffrey Beall at the University of Colorado Denver (see Nature 495, 433–435; 2013).

and, later in the piece:

Bjørnshauge says that a small cohort of some 30 voluntary associate editors — mainly librarians and PhD students — will check the information submitted in reapplications with the publishers, and there will be a second layer of checks from managing editors. He also finds it “extremely questionable to run blacklists of open-access publishers”, as Beall has done. (Crawford’s study found that Beall’s apparently voluminous list includes many journals that are empty, dormant or publish fewer than 20 articles each year, suggesting that the problem is not as bad as Beall says.)

Naturally (or Natureally), I’m delighted to have my name show up, and a C&I issue linked to, in Nature. (It didn’t come as a complete surprise: the journalist sent me email asking about my affiliation–none–and, later, where I live.)

I’m not quite as delighted with the slant of that first paragraph (quite apart from the fact that Beall’s lists do not list some 9,200 “potential predatory journals,” they include publishers that publish or “publish” that number of journal names). Namely, I think the story is not that 900 “potentially predatory” journals appear in DOAJ with the loose listing criteria that site formerly used. I think the story is that more than 90% of the journals in DOAJ are not reflected in Beall’s list, given his seeming zeal to target OA journals.

But, of course, it’s the journalist’s story, not mine, and I do not feel I was quoted incorrectly or unfairly. (Incidentally, I don’t  have nits to pick with the second paragraph.)

I agree with Bjørnshauge that a blacklist is itself questionable.

Do I believe the much improved DOAJ will constitute a real whitelist? I’m not sure; I think it will be a great starting point. If a journal’s in the new DOAJ, and especially has the DOAJplus listing, it’s fair to assume that it’s probably a reasonably good place to be. (But then, I’m no more an expert in what journals are Good or Bad than Beall is.)

Anyway: thanks, Richard Van Noorden, for mentioning me. I hope the mention leads more people to read more about questionable journals than just Beall’s list. I strongly believe that the vast majority of Gold OA journals are as reputable as the vast majority of subscription journals, and I believe I’ve demonstrated that there aren’t any 9,200 “predatory” journals out there that are actual journals researchers with actual brains and a modicum of common sense would ever submit articles to.

A few readers may know that I’ve embarked on a related but even more ambitious (or idiotic) project, having to do with volume of articles and adding a new and very different control group. Dunno when (if?) I’ll finish the huge amount of desk work involved and produce some results. I do believe that, among other things, the results may shed some light on the apparent controversy over how prevalent APCs are among Gold OA journals… (And, incidentally, more financial support for C&I wouldn’t hurt this process.)


This blog is protected by dr Dave\\\\\\\'s Spam Karma 2: 105028 Spams eaten and counting...