In partial defense of Jeffrey Beall

March 25th, 2016

Not in defense of his lists, which I regard as a bad idea in theory and fatally flawed in practice, for reasons I’ve documented (most recently here but elsewhere over time).

But…I’ve seen some stuff on another blog lately that bothers me.

  • I do not for a minute believe that Jeffrey Beall wrote the supposed email I’ve seen that suggests a listed publisher would be re-evaluated for $5,000. That email was written using English-as-a-third-language grammar; it’s just not plausible as coming from Beall.
  • I truly dislike the notion that a doctorate is the minimum qualification for scholarship. But then, I would, wouldn’t I (since my pinnacle of academic achievement is a BA and a handful of credits toward an MA).
  • I also dislike the notion that state colleges are somehow disreputable. My own degree comes from a state institution, and I’ll match its credentials with anybody.

The same blog had an interesting fisking of one of Beall’s sillier anti-OA papers. I had tagged it toward a future Cites & Insights essay on access and ethics. But after seeing this other stuff…I won’t link to or source from this particular blog.  Heck, I’ve been the subject of Beall’s ad hominem attacks; doesn’t mean I have to support that sort of thing.

Cites & Insights 16:3 (April 2016) available

March 23rd, 2016

The April 2016 Cites & Insights (16:3) is now available for downloading at http://citesandinsights.info/civ16i3.pdf

That print-oriented version is 30 pages long. If you’re planning to read online or on an ereader, you may prefer the single-column 6″ x 9″ version, 59 pages long, available at http://citesandinsights.info/civ16i3on.pdf

While much of this issue has appeared as a series of posts in this blog, the final section of the lead essay is new, as is the fourth essay; the final section reprints 35 pages of The Gold OA Landscape 2011-2014 to serve as context for a portion of the first essay.

This issue includes:

The Front: Gold Open Access Journals 2011-2015: A SPARC Project pp. 1-8

Remember the “watch this space” note in the February-March “The Front”? This is what it was about. This essay includes the key announcement, a partial list of changes from the 2011-2014 project, a partial checkpoint prepared when I was halfway through the first pass, a section asking for possible “changes for the better” in the analysis and writeup (note that this year’s PDF ebook will be free and OA, since it’s a SPARC-sponsored project), another section discussing the planned anonymization of the (free) spreadsheet when analysis is done–and, new to this issue, a second checkpoint prepared at the end of the first journal pass.

The Front (also): Readership Notes  pp. 8-9

Notes on the most frequently downloaded issues in Volume 15 and the most frequently downloaded issues overall.

Intersections: “Trust Me”: The Other Problem with Beall’s Lists  pp. 9-11

As far as I can tell, Jeffrey Beall provides no evidence whatsoever–not even his classic “this publisher has a funny name”–for seven out of eight journals and publishers on his 2016 lists. This piece, which has a little additional material beyond the original post, goes into some detail.

The Back  pp. 11-12

Not precisely filler to get an even number of pages, but…OK, so these three mini-rants are mostly filler to get an even number of pages.

The Gold OA Landscape 2011-2014, pp. 39-73   following page 12

I’m including chapters 5 (starting dates), 6 (country of publication), 7 (segments and subjects), 8 (biology and medicine) and 9 (biology) to provide more context for my invitation to suggest better ways to analyze and present the 2011-2015 data. Please note that these pages appear precisely as they would in the PDF ebook if you’re looking at the online 6″ x 9″ version (since the book’s 6″x9″), but are reduced very slightly for the print-oriented version (to 5.5″x8.5″) so that two book pages will fit on one printed page.

Next issue?

I did not label this the April-May 2016 issue. Whether there’s a May issue in late April or early May, or a May-June issue later in May, depends on a number of factors having mostly to do with Gold Open Access Journals 2011-2015.

Why Anonymize?

March 14th, 2016

The project plan for Gold Open Access Journals 2011-2015 calls for me to make an anonymized version of the master spreadsheet freely available—and as soon as the project was approved, I made an anonymized version of the 2014 spreadsheet available.

Two people raised the question “Why anonymized?”—why don’t I just post the spreadsheet including all data, instead of removing journal names, publishers and URLs and adding a simple numeric key to make rows unique?

The short answer is that doing so would shift the focus of the project from patterns and the overall state of gold OA to specifics, and lead to arguments as to whether the data was any good.

Maybe that’s all the answer that’s needed. Although I counted very little use of the 2014 spreadsheet in January and February 2016, it’s been used more than 900 times in the first half of March 2016—but I have received no more queries as to why it’s anonymized. For any analysis of patterns, of course, journal names don’t matter. But maybe a slightly longer answer is useful.

That longer answer begins with the likelihood that some folks would try to undermine the report’s findings by claiming that the data is full of errors—and the certainty that such folks could find “errors” in the data.

Am I being paranoid in suggesting that this would happen? Thanks to Kent Anderson, I can safely say that I’m not, since within a day or two of my posting the spreadsheet, he tweeted this:

Anderson didn’t say “Am I misunderstanding?” or “Clarification needed” or any alternative suggesting that more information was needed. No: he went directly on the attack with “Errors exist” (by completely misreading the dataset, as it happens: around 500 gold OA journals began publication, usually not as OA, between 1853 and 1994).

It’s not wrong, it’s just different

To paraphrase Ed and Patsy Bruce (they wrote the song, even though Willie Nelson and Waylon Jennings had the big hit with it)…

If somebody else—especially someone looking to “invalidate” this research—goes back to do new counts on some number of journal, they will probably get different numbers in a fair number of cases.

Why? Several reasons:

  • Inclusiveness: Which items in journals—and which journals—do you include? The 2014 count tended to be more exclusive when I had to count each article individually; the 2015 count tends to include all items subject to some form of review, including book reviews and case reports. Similarly, the 2015 report includes journals that consist of (reviewed) conference reports (although I’ll note the subset of such journals).
  • Shortcuts: I did not in fact look at each and every item in each and every issue of each and every journal, compare it to that journal’s own criteria for reviewed or peer-reviewed, and determine whether to include it. To do that, I’d estimate that a single year’s count would require at least 2,000 hours exclusive of determining APC existence and levels and all other overhead—and, of course, a five-year study would require four times that amount (fewer journals and articles in earlier years). That’s not plausible under any circumstances. Instead, I used every shortcut that I could: publication-date indexes or equivalent for SciELO, J-Stage, MDPI, Dove and several others; DOI numbers when it’s clear they’re assigned sequentially; numbered tables of contents; Find (Ctrl-F) counts for distinctive strings (e.g., “doi:” or “HTML”) after quick scans of the contents tables. For the latter, I did make rough adjustments for clear editorials and other overhead.
  • Estimates: In some cases—fewer in 2015 than in 2014, but still some—I had to estimate, as for instance when a journal with no other way of counting publishes hundreds of articles each year and maintains page numbering throughout a dozen issues. I might count the articles in one or two issues, determine an average article length, and estimate the year’s total count based on that length. I also used counts from DOAJ in many cases, when those counts were plausible based on manual sampling.
  • Errors: I’m certain that my counts are off by one or two in some cases; that happens.
  • Late additions: Some journals, especially those that are issue-oriented and still include print versions, post online articles very late. Even though I’m retesting all cases where the “final issue” of 2015 seemed to be missing when checked in January-March 2016, it’s nearly certain that somebody looking at some journals in, say, August 2016 will find more 2015 articles than I did.

In practice, I doubt that any two counts of a thousand or more OA journals will yield precisely the same totals. I’d guess that I’m very slightly overcounting articles in some journals that provide convenient annual totals—and undercounting articles in some journals that don’t.

For the analysis I’m doing, and for any analysis others are likely to do, these “errors” shouldn’t matter. If somebody claimed that overall numbers were 5% lower or 5% higher, my response would be that this is quite possible. I doubt that the differences in counts would be greater than that, at least for any aggregated data.

Making the case

If you believe I’m wrong—that there are real, serious, worthwhile research cases where only the unanonymized version will do—let me know (waltcrawford@gmail.com).

Obviously, anonymized datasets aren’t unusual; I don’t know of any open science advocate who would seriously argue that medical data should be posted with patient names or that libraries should keep enough data to be able to do analysis such as “people who borrowed X also borrowed Y.” In practice, there may be special use cases for an open copy of the master spreadsheet. On the other hand, except for the list of journals flagged as having malware on their sites, I’ll be doing my analysis with the anonymized spreadsheet—it’s what’s needed for this work, and won’t distract me with individual journal titles and how I might feel about their publishers.

Changes for the Better?

March 11th, 2016

Do you have suggestions that will help make Gold Open Access Journals 2011-2015 even better than The Gold OA Landscape 2011-2014?

If so, now’s the time to suggest them—any time between now and May 1, 2016 (the earliest date I’m likely to start working on data analysis and the book manuscript). Suggestions should go to me at waltcrawford@gmail.com.

You say you haven’t purchased the book yet, either in paperback or PDF ebook form? You still can, and it will still be worthwhile when the new book comes out.

Alternatively, you can get a good idea of the general approach and tables used in the excerpt published as the October 2015 Cites & Insights, although that version lacks any graphs.

I’ve appended pages 39 through 73 of The Gold OA Landscape 2011-2014 to the end of the next Cites & Insights, probably out in late March 2016. That segment includes almost all varieties of tables and graphs used in the book. The online version is an exact replica of the print book; the print (two-column) version is just slightly smaller, so that four pages of the 6×9″ book fit on each 8.5×11″ sheet rather than having loads of waste space.

The Basics

Basically, the data used for analysis includes for each journal the year reported to DOAJ (which is not always the start of publication), the country of publication (again as reported to DOAJ), one of 28 subjects and three broad areas that I’ve derived from the subjects, keywords and journal/article titles for the journals, and the data I went looking for: whether there’s an author-side fee (usually called an APC or Article Processing Charge but they’re not all that straightforward) and how much it is, and the number of published articles (and similar items) for each year 2011 through 2015. There’s also a two-letter code (or “grade and subgrade”) for special cases, but most journals don’t have special codes. I also derive some measures: the peak article number during the five years and, if there are APCs, the maximum revenue for 2014 (2015 this time around).

Last year, after an overall discussion of maximum revenues, overall article counts, and special cases, I looked at journals by annual article volume for each of the three major areas (which have very different characteristics), fee and revenue levels, starting dates for free and APC-charging journals, and a number of measures by country of publication. I also provided one set of pie charts breaking down free and pay journals by major area.

For each of the three major areas (biomed, STEM, and humanities and social sciences) I looked at cost per article by year, journal and article volume by year (and free percentage of each), revenue brackets for journals, article volume brackets, and APC level brackets. A bar graph showed free and pay articles for each year.

For each subject within an area—using the revenue and article volume brackets appropriate for that area—I showed journals and articles for each year (and free percentage), the free/pay article bar graph, journals by article volume (and percent free), journals and articles by APC range, a line graph showing free and pay journals by starting date, and a table showing the countries with the most published 2014 articles for that subject.

At the end of the book, I provided a few subject summaries—percentage of free journals, percentage of articles in no-fee journals, change in article volume, change in free article volume, journals changing article volume by 10% or more from 2013 to 2014, average APC per paid article and for all articles, median APC per paid article and all articles, and the median, first quartile, and third quartile articles per journal for 2014.

Data Changes for 2015

There’s another year of data—more journals and more data for existing journals. I’m taking some pains to include more journals (and defining “articles” somewhat more inclusively and, I believe, consistently).

Beyond that, there may be one new category of derived data: a publisher category—breaking journals down into what seem to be five reasonable groups based on what’s in the DOAJ publisher field:

  • Academic, published by universities and colleges, including university presses.
  • Society, published by societies and associations.
  • Traditional*, published by publishers that also publish subscription journals.
  • OA publisher*, published by groups that don’t appear to publish subscription journals (and that publish at least a handful of journals—see notes on the “*” below)
  • Miscellany, everybody else.

About the asterisk on Traditional and OA publisher: there are 5,983 different “publisher names” (that is, distinct character strings in the DOAJ publisher field). That’s more than one “publisher” for every two journals. The vast majority of those, all but 919, publish a single DOAJ-listed journal.

I think it’s reasonable to limit the two “publisher” categories (Traditional and OA) to firms that publish at least a handful of journals, and lump the others in as Miscellany. (If nothing else, it makes this added data feasible.)

What’s a handful? If the cutoff is “five or more,” it involves only 221 publishers in all, accounting for 4,128 journals. If the cutoff is “four or more,” it involves 316 publishers—and, naturally, adds 380 journals for a total of 4,508. Dropping it to “three or more journals” brings us up to 486 publishers and 5,018 journals. I suspect the final cutoff will be either four or five

Incidentally, if I add that column, it will be in the anonymized spreadsheet made publicly available at the end of this project. Other than the list of journal titles apparently containing malware, it will be possible for anybody else to replicate any or all of the graphs and numbers in the book.

Probable Changes

I believe it will make sense to devote a chapter to publisher categories—whether there are major differences in article volume, APC charges (existence and amount) and, possibly, domination in some countries.

I’m fairly certain the pie charts will go away: I don’t believe they add enough information to justify the space. I could be convinced otherwise. (Note that the print paperback will, of necessity, be black and white to keep production costs down, so really attractive pie charts aren’t feasible.)

Possible Changes

What else should I consider? Which existing tables and graphs don’t seem especially valuable—and what would work better? (Assume that this year’s book can be larger than last, but not enormously larger.)

I’m open to suggestions, which I’ll discuss with my contacts at SPARC (and I anticipate suggestions from SPARC as well).

I would offer a free PDF version of this year’s book as a reward for good suggestions—but since this year’s PDF version will be free in any case, that’s

Gold OA Journals 2011-2015: Grade Changes and an Update

February 10th, 2016

After reviewing the numbers in The Gold OA Landscape 2011-2014 and considering what I can and, more significantly, cannot reasonably ascertain and judge in non-English journals and in short visits to websites, and in consultation with SPARC contacts, I made a number of changes in grades and, as a result, in exclusions.

I did not change the list of subjects and areas, although a few journals may have been assigned new subjects—and, as in the previous study, PLOS One is omitted from subject and area figures but included in overall discussions.

The fundamental meaning of Grade B has changed from “deserves attention” to “might be excluded from DOAJ or in some versions of Open Access.”

Changes in Grade A Subgrades

All subgrades for Grade A have been eliminated. Subgrade C (ceased) is now a subgrade for Grade B. Subgrades D, E, H, O and S—all cases where some year other than the first had fewer than five articles—have been collapsed into Grade B, Subgrade F (few or no 2015 articles) if the article count for 2015 is less than 5 and simply Grade A otherwise.

Changes in Grade B Subgrades

Grade B consists of journals that may or may not belong, either in DOAJ or in a study of open access, depending on your definitions. The old subgrades all have to do with mild visual or editorial issues that now seem as though they’re imposing my own values inappropriately.

There are four new subgrades—two from Grade A and two from Grade X, albeit with different letters.

  • C: Ceased—journals that published at least one article later than 2010 but explicitly ceased during or before 2015, have merged with other journals, or show no articles more recent than 2012.
  • F: Few or no 2015 articles—journals that published at least one article later than 2012 and published fewer than five articles in 2015. (By current DOAJ rules, these are subject to delisting.)
  • R: Conference and other reports—journals consisting entirely or primarily of conference papers and other reports. These were previously excluded, in subgrade XN, as not OA.
  • S: Sign-in or registration required—journals that require some form of registration before reading articles. These were previously excluded, also in subgrade XN, as not OA.

Changes in Grade C Subgrades

Grade C, “avoid this journal,” has been narrowed somewhat, specifically to eliminate subgrades that involve personal judgment or have so few journals that they’re hardly worth noting. Specifically, subgrades E (very bad English), S (incoherent site) and T (absurd article titles—there were almost none of these) have been eliminated, leaving subgrades A (APC missing), F (clear falsehoods), O (mix of problems) and P (implausible peer review turnaround). Briefly, clear falsehoods are statements such as “the leading journal in this field” for a brand-new journal; implausible peer-review turnaround involves promises to complete all peer reviews in a couple of days.

Changes in Grade X Subgrades

Grade X, excluded journals, retains the same subgrades—but the two largest categories within subgrade N (not OA) have been moved to subgrades BR and BS.

A Partial Checkpoint

What are the consequences of these changes? In general, and combined with more exhaustive checking of some difficult situations, they should mean that more journals will be included in the full analysis. As for specific results, those won’t be clear until the project is complete.

I thought it would be worth offering some glimpses into what might be happening at a natural breakpoint: essentially halfway through the first pass of data gathering (actually 5,500 of 10,948).

First pass? Yes indeed. There will be a second pass, beginning no earlier than April 1, 2016, for quite a few of the journals, for various reasons:

  • Many smaller journals, especially in the humanities and social sciences, post online articles and issues with significant delays. In practice, even waiting a year won’t get them all. I’m rechecking all journals that appear to be missing final issues for 2015; this gives them at least three months to get the articles posted.
  • I’m rechecking all journals that couldn’t be reached or that showed signs of malware, as well as those that showed as parking or ad pages or were unworkable.
  • I’ll take a second look at journals excluded for various reasons, trying harder to make sense of opaque cases and translation difficulties, looking more closely for apparently-missing APCs, rechecking whether certain journals are OA or not.

So far, it looks as though I’ll need to recheck about one-fifth of the journals: 1,047 of the first 5,500. I’d be delighted if that percentage goes down in the second half—but I’d also be surprised.

All the rest of these numbers are truly tentative, since review of the journals may change their categorization.

Free and Pay

Some journals started imposing APCs that didn’t have them previously (one large publisher dropped all of its free introductory periods); some (fewer) drop APCs; and some clarify the nature of their charges.

Overall, the percentage of no-APC journals (among journals where it’s clear) among the first half dropped from 64.9% to 59.8%: there are more no-fee journals than in the previous study, but there are a lot more APC-charging journals. (There are also, to be sure, more journals in general: about 412 so far.) There are fewer journals (so far) where there is an APC but it’s hidden.

The Newbies

Most journals that weren’t in the 2014 study are simply A (that is, “nothing special here one way or the other”), but 30 have fewer than five articles in 2015, a few couldn’t be contacted or were unworkable, a handful fall into various other categories—and, unfortunately, nine showed signs of malware.

Neutral Changes

Some changes in grade and subgrade are neutral: they’re just redefinitions. That’s true for the journals that changed from various A grades to BC (ceased explicitly or with no articles later than 2012): there are some 218 BC so far. It’s also true for the various A subgrades that are now simply A (around 230 of them) and for a number of other changes including quite a few moving from B subgrades to A.

Some 300 journals had five or more articles in 2014 but not in 2015, moving them all to BF: some of those will add articles in a recheck.

Changes for the Good

Some 27 journals previously graded CA (APC missing or hidden) now have more clarity (and four changed to various X subgrades).

Quite a few journals with explicit falsehoods on their homepages have been cleaned up—at least 80 of them.

Half a dozen journals flagged for malware no longer seem to have that problem (but see later!).

Most “not OA” entries in the first half have moved elsewhere on re-examination or redefinition, including 35 journals oriented to conference programs (another seven that had been “A” appear to be predominantly conferences and have been moved here) and ten that require registration to read articles. Some two dozen moved elsewhere, including 17 that now appear to be proper OA journals.

Most journals that I previously found too difficult to count (XO) are now handled, and I hope to reduce the number (70 for this half in the previous study is currently down to 28) even further.

Roughly half of the XT (couldn’t understand the site well enough to measure it) cases have been cleared up: so far, there are only three such journals in the first half, and I’ll try all of them again.

Changes for the Bad

A few journals have changed home pages such that I can no longer find an APC (but am sure they have one), but it’s a tiny number.

Some 70 journals that were reachable the last time around are either unreachable or unworkable when I checked this time; they’ll all be rechecked, but it’s unfortunate that there are so many.

Finally there’s the most unfortunate group, in my opinion: journals that now show signs of malware—frequently, I suspect, because they include ad networks that don’t have proper standards. A journal gets flagged for malware if Malwarebytes or McAfee Site Advisor or Windows Defender flags it or some of its components as malware; cases include phishing attempts and deliberate malware downloads. There are now twice as many of these as there were (for this subset of journals) in the previous study, and that’s about 72 too many.

Summing Up

Hundreds of new journals; a much shorter and simpler set of grades; adding literally thousands of peer-reviewed articles that were given as conference papers.

Far fewer journals falling by the wayside because I only read English (thanks, Google!) or because I can’t or am unwilling to count them (with true broadband, I’m willing to open up a dozen PDFs a year to see how many articles there are).

There will still be some approximate counts, but fewer (and better approximations) than last time around.

And, of course, the results will be freely available to everybody. In a few months.

Not quite gone: a short catchall post

February 9th, 2016

Just thought I’d drop a line to say why I’m posting even less than usual, and why that’s likely to continue for a few weeks months…

You can guess the major reason: Gold Open Access Journals 2011-2014.

I’m trying to do the scan as carefully as possible, and include as many DOAJ-listed journals as possible.

Oh, that’s not all I do: I rarely do any of it after supper, there’s still (some) TV, I’m still reading roughly a book a week and lots of magazines, there’s still the Wednesday hike (or long walk) and the daily 1.3-mile walk around the block. But it takes up a fair amount of time.

Optimistic schedule

If all goes well, I hope to complete the first pass sometime in mid-March. I won’t start the second pass (revisiting a couple of thousand journals where revisits are required or advisable) until early April.

In between, I hope to put together some sort of Cites & Insights issue.

But there’s also a medical situation in late March that could have me out of commission (at least where typing’s concerned) for anywhere from a day or two to several weeks; the day or two is more likely, but you never know. (Benign Schwannoma on the forearm, if you must know: “benign” being the key word.)

Come April, there’s the rescan–a lot fewer journals, but each one will take significantly more time. At least I hope many of them do: part of the revisit is all journals that were unreachable or unworkable or raised malware flags, and I hope a fair number of those don’t have the same exclusionary conditions.

(So far, the only discouraging part of this new project is that too damn many OA journals–not very many in the overall scheme of things, but still too damn many–cause Malwarebytes or McAfee Site Advisor or Windows Defender or, in one case MS Office to say “do you really want to go there?” I  believe that uncontrolled ad sites make up a lot of the problem, but in any case it is simply not acceptable for any journal site to have code that triggers malware warnings. Nor will I ignore the warnings. If I had a dedicated Chromebook, I suppose I could–but that wouldn’t be helpful for others. And yes, I did get a serious bit of malware last time around, and it became clear that at least one other journal was trying to install the same code; that’s why I use Malwarebytes these days.)

I’m guessing I’ll need to take more breaks during the rescan, so there may be more blog posts and activity at Cites & Insights. Then, of course, comes the analysis and writeup… after which I may have a good deal more time. Or not.

Not complaining; I love this. It’s a little triumph each time I can fully analyze a journal I’d left out before, even if it means opening up a dozen PDFs for each of the past five years. At least now I have real broadband, so that’s feasible if annoying. (“Real broadband” as in Comcast, guaranteed 25mbps, actual 30mbps–as opposed to “Uverse” 1.5mbps but dropping entirely once or twice or more a day.)

Still around, still mildly active in various parts of the LSW diaspora, but mostly doing research. And enjoying it.

“Trust Me”: The Other Problem with 87% of Beall’s Lists

January 29th, 2016

Here’s the real tl;dr: I could only find any discussion at all in Beall’s blog for 230 of the 1,834 journals and publishers in his 2016 lists—and those cases don’t include even 2% of the journals in DOAJ.

Now for the shorter version…

As long-time readers will know, I don’t much like blacklists. I admit to that prejudice belief: I don’t think blacklists are good ways to solve problems.

And yet, when I first took a hard look at Jeffrey Beall’s lists in 2014, I was mostly assessing whether the lists represented as massive a problem as Beall seemed to assert. As you may know, I concluded that they did not.

But there’s a deeper problem—one that I believe applies whether you dislike blacklists or mourn the passing of the Index Librorum Prohibitorum. To wit, Beall’s lists don’t meet what I would regard as minimal standards for a blacklist even if you agree with all of his judgments.

Why not? Because, in seven cases out of eight (on the 2016 lists), Beall provides no case whatsoever in his blog: the journal or publisher is in the lists Just Because. (Or, in some but not most cases, Beall provided a case on his earlier blog but failed to copy those posts.)

Seven cases out of eight: 87.5%. 1,604 journals and publishers of the 1,834 (excluding duplicates) on the 2016 versions have no more than an unstated “Trust me” as the reason for avoiding them.

I believe that’s inexcusable, and makes the strongest possible case that nobody should treat Beall’s lists as being significant. (It also, of course, means that research based on the assumption that the lists are meaningful is fatally flawed.)

The Short Version

Since key numbers will appear first as a blog post on Walt at Random and much later in Cites & Insights, I’ll lead with the short version.

I converted the two lists into an Excel spreadsheet (trivially easy to do), adding columns for “Type” (Pub or Jrn), Case (no, weak, maybe or strong), Beall (URL for Beall’s commentary on this journal or publisher—the most recent or strongest when there’s more than one), and—after completing the hard work—six additional columns. We’ll get to those.

Then I went through Beall’s blog, month by month, post by post. Whenever a post mentioned one or more publishers or independent journals, I pasted the post’s URL into the “Beall” column for the appropriate row, read the post carefully, and filled in the “Case” column based on the most generous reading I could make of Beall’s discussion. (More on this later in the full article, maybe.)

I did that for all four years, 2012 through 2015, and even January 2016.

The results? In 1,604 cases, I was unable to find any discussion whatsoever. (No, I didn’t read all of the comments on the posts. Surely if you’re going to condemn a publisher or journal, you would at least mention your reasons in the body of a post, right?)

If you discard those on the basis that it’s grotesquely unfair to blacklist a journal or publisher without giving any reason why, you’re left with a list of 53 journals and 177 publishers. Giving Beall the benefit of the doubt, I judged that he made no case at all in five cases (the fact that you think a publisher has a “funny name” is no case at all, for example). I think he made a very weak case (e.g., one questionable article in one journal from a multijournal publisher) in 69 cases. I came down on the side of “maybe” 43 times and “strong” 113 times, although it’s important to note that “strong” means that at some point for some journal there were significant issues raised, not that a publisher is forever doomed to be garbage.

Call it 156 reasonable cases—now we’re down to less than 10% of the lists.

Then I looked at the spreadsheets I’m working on for the 2015 project (note here that SPARC has nothing at all to do with this little essay!)—”spreadsheets” because I did this when I was about 35% of the way through the first-pass data gathering. I could certainly identify which publishers had journals in DOAJ, but could only provide article counts for those in the first 35% or so. (In the end, I just looked up the 53 journals directly in DOAJ.)

Here’s what I found.

  • Ignoring the strength of case, Beall’s lists include 209 DOAJ journals—or 1.9% of the total. But of those 209, 85 are from Bentham Open (which, in my opinion, has cleaned up its act considerably) and 49 are from Frontiers Media (which Beall never actually made a case to include in his list, but somehow it’s there). If you eliminate those, you’re down to 75 journals, or 0.7%: Less than one out of every hundred DOAJ journals.
  • For that matter, if you limit the results to strong and maybe cases, the number drops to 37 journals: 0.33%, roughly one in every three hundred DOAJ journals.
  • For journals I’ve already analyzed (and since I’m working by publisher name, that includes most of these—at this writing, January 29, I just finished Hindawi), total articles were just over 16,000 (with more to come on a second pass) in 2015, just under 14,000 in 2014, just over 10,000 in 2013, around 8,500 in 2012, and around 4,500 in 2011.
  • But most of those articles are from Frontiers Media. Eliminating them and Bentham brings article counts down to the 1,700-2,500 range. That’s considerably less than one half of one percent of total serious OA articles.
  • The most realistic counts—those where Beall’s made more than a weak case—show around 150 articles for 2015, around 200-250 for 2013 and 2014, around 1,000 for 2012 and around 780 for 2011 (Those numbers will go up, but probably not by much. There was one active journal that’s mostly fallen by the wayside since 2012.)

The conclusion to this too-long short version: Beall’s lists are mostly the worst possible kind of blacklist: one where there’s no stated reason for things to be included. If you’re comfortable using “trust me” as the basis for a tool, that’s your business. My comment might echo those of Joseph Welch, but that would be mean.

Oh, by the way: you can download the trimmed version of Beall’s lists (with partial article counts for journals in DOAJ, admittedly lacking some of them). It’s available in .csv form for minimum size and maximum flexibility. Don’t use it as a blacklist, though: it’s still far too inclusive, as far as I’m considered.

Modified 1/30: Apparently the original filename yields a 404 error; I’ve renamed the file, and it should now be available. (Thanks, Marika!)

Gold Open Access Journals 2011-2015: A SPARC Project

January 22nd, 2016

I’m delighted to announce that SPARC (the Scholarly Publishing and Academic Resources Coalition) is supporting the update of Gold Open Access Journals 2011-2015 to provide an empirical basis for evaluating Open Access sustainability models. I am carrying out this project with SPARC’s sponsorship, building from and expanding on The Gold OA Landscape 2011-2014.

The immediate effect of this project is that the dataset for the earlier project is publicly available for use on zenodo.org and on my personal website. The data is public domain, but attribution and feedback are both appreciated.

Here’s what the rest of the project means:

  • I am basing the study on the Directory of Open Access Journals as of December 31, 2015. With eleven duplicates (same URL, different journal names, typically in two languages) removed and reported back to DOAJ, that means a starting point of 10,948 journals. All journals will be accounted for, and as many as feasible will be fully analyzed.
  • The grades and subgrades have been simplified and clarified, and two categories of journal excluded from the 2014 study will now be included (but tagged so that they can be counted separately if desired): journals consisting primarily of conference reports peer-reviewed at the conference level, and journals that require free registration to read articles.
  • I’m visiting all journal sites (and using DOAJ as an additional source) to determine current article processing charges (if any), add 2015 article counts to data carried over from the 2014 project, clean up article counts as feasible, and add 2011-2014 article counts for journals not in the earlier report.
  • Since some journals (typically smaller ones) take some time to post articles, and since some journals will not be analyzed for various reasons (malware, inability to access, difficulty in translating site or counting articles), I’ll be doing a second pass for all those requiring such a pass, starting in April 2016 or after the first pass is complete. My intent is to include as many journals as possible (although existence of malware is an automatic stopping point), although that doesn’t extend to (for example) going through each issue of a weekly journal only available in PDF form.
  • The results will be written up in a form somewhat similar to The Gold OA Landscape 2011-2014, refined based on feedback and discussion.
  • Once the analysis and preparation are complete, the dataset (in anonymized form) will be made freely available at appropriate sites and publicized as available.
  • The PDF version of the final report will be freely available and carry an appropriate Creative Commons license.
  • A paperback version of the final report will be available; details will be announced closer to publication.
  • A shorter version of the final report will appear in Cites & Insights, and it’s likely that notes along the way will also appear there.

My thanks to SPARC for making this possible.

Dataset for The Gold OA Landscape 2011-2014 now available

January 21st, 2016

I’m pleased to announce that the anonymized dataset used to prepare The Gold OA Landscape 2011-2014 is now available for downloading and use.

The dataset–an Excel .xlsx spreadsheet with two workbooks–includes 9,824 rows of data, one for each journal graded A through C (and, thus, fully analyzed) in the project. Each row has a dozen columns. The columns are described on the second “data_key” workbook.

I would love to be able to say that this dataset was now on figshare–but after wasting spending far too much time attempting to complete the required fields and publish the dataset, it appears that the figshare mechanisms are at least partly broken. When (if) I receive assurances that the scripts (which fail in current versions of Chrome, Firefox and Internet Explorer) have been fixed, I’ll add the dataset there–although I’d be happy to hear about other no-fee dataset sharing sites that actually work. (It’s possible that figshare just doesn’t much care for free personal accounts any more: I also note that the counts of dataset usage that were previously available have disappeared.)

Update January 22, 2016: This dataset is now available on zenodo.org. (Hat-tip to Thomas Munro.)

As always, the best way to understand the data in this spreadsheet is via either the paperback version or the PDF ebook site-licensed version of The Gold OA Landscape 2011-2014.


Note: This isn’t quite the “Watch This Space” announcement foreshadowed in Cites & Insights 16:2, and it doesn’t mean that sales of the book have suddenly mushroomed. That announcement–which is related to this one–should come in a few days.

By the way, while the dataset consists of facts and is therefore in the public domain, I’d appreciate being told about uses of the spreadsheet and certainly appreciate proper attribution. Send me a note at waltcrawford@gmail.com

I’d also love your suggestions as to ways the presentation in the book could be improved if or when there’s a newer version…leave a comment or, again, send email to waltcrawford@gmail.com

“Trust me”: The Apparent Case for 90% of Beall’s List Additions

January 7th, 2016

I’ve tried to stay away from Beall and his Lists, but sometimes it’s not easy.

The final section of the Intersections essay in the January 2016 Cites & Insights recounts a quick “investigation” into the rationales Beall provided for placing 223 publishers on his 2014 list. Go to page 8: it’s the section titled “Lagniappe: The Rationales, Once Over Easy.” I found that I could find any rationale for condemning the publishers in only 35% of cases.

Perhaps too charitably, I assumed that it was because Beall’s blog changed platforms and he didn’t take the time to restore older posts to the new blog.

Then I noted his 2016 lists–which add 230 (or more) publishers and 375 (or more) independent journals to the 2015 lists. I say “or more” because at least one major publisher has been removed via the Star Chamber Appeal Process, even though Beall continues to attack the publisher as unworthy.

In any case: 605 new listings. My recollection is that there haven’t even been close to 605 posts on Beall’s blog in the past year… but I thought I’d check it out.

The results: As far as I can tell, posts during 2015 include around 60 new publishers and journals. (I may have missed a couple of “copycat” journals, so let’s call it 65).

Sixty or 65. Out of 605.

In other words: for roughly 90% of publishers (most of them really “publishers,” I suspect) and journals added to the list, there is no published rationale whatsoever for Beall’s condemnation.

None.

So if you’re wondering why I regard Beall as irrelevant to the reality of open access publishing (which isn’t all sweetness & light, any more than the reality of subscription publishing), there’s one answer.