The cuts to DOAJ: A few preliminary notes

May 11th, 2016

I’m partway through writing Gold Open Access Journals 2011-2015, as some of you are well aware. That book is based on an exhaustingve survey of journals in the Directory of Open Access Journals (DOAJ) as of December 31, 2015: their APCs (or lack thereof) and article counts 2011-2015.

When I became aware of the big cut on May 9, 2016, with DOAJ dropping some 3,300 journals that had not submitted requests to be included following the new rules–after more than a year of publicity and repeated email requests–I decided it was worth discussing at the end of the book. (It doesn’t really affect the book: these journals were in DOAJ on 12/31/15.)

That continues to be my plan, but since I put together a matrix workbook to make some tables and graphs for the book easier and more consistent, I thought I could do a quick workup now–preliminary, tentative, but probably pretty close.

Update 5/22/16: I’ve now completed a more careful matching of a slightly later DOAJ dataset, resulting in 2,948 dropped journals. That change–nine more journals that are still there–will change a few numbers, but not by much. The revised figures will show up in the book, probably within the next two weeks.

The Overall Picture

URL and journal title matching shows 2,957 journals missing on May 10, 2016 that were there on December 31, 2015.

  • First good news: More than half of the journals I excluded from the study are now gone–316 of 620. That includes more than two-thirds of journals with hidden or missing APCs (and I’m guessing the rest have filled in the information) and almost half of the unreachable and unworkable journals. Unfortunately, it includes less than one-third of the journals showing signs of malware. (Curiously, it includes the only journal I couldn’t include because of translation problems–and, perhaps less curiously, more than 70% of those where it was impossible or too cumbersome to count articles by year.
  • Oddly, while three-quarters of journals with no 2014 or 2015 articles are gone, as are most journals with no 2015 articles, only 38% of apparently-cancelled journals and 36% of journals seemingly too small for the new DOAJ are gone.
  • Ignoring excluded journals, just under 26% of journals are gone–but, not surprisingly, that breaks down to only 1.4% of APCLand journals and 29% of OAWorld journals. (If you’re not familiar with those terms, read the current Cites & Insights.)

A Few Specifics

  • Only 23% of journals with 2015 articles are gone–26% of free journals, 17% of APC-charging journals.
  • The article count is down 22%.
  • Dropped free journals have been declining in article count: the dropped group includes 33% of articles in those journals in 2011, down to 27.5% in 2015.
  • The largest (600+ articles) and smallest (0-19) journals disappeared more frequently than midrange journals.
  • Among fee-charging journals, those with lower fees disappeared more often than those with higher fees–30-31% for $2-$199 and $200-$599, 10% and 3% for $600-$1,399 and $1,400+.
  • Separating APCLand as a virtual region, the highest percentage of dropped journals is not in the global South: It’s what I call Pacific/English [with apologies to Quebec]–Australia, New Zealand, Canada and the United States: 39%. Asia had the second highest percentage of dropped journals, 35% (and by far the highest percentage of dropped articles, 47%), followed by the Middle East and Latin America (both 33%, but Latin America’s article loss is much lower.) The lowest percentage of dropped journals is in Eastern Europe, at 18%. Given that OAWorld’s 29% is the baseline here, only Pacific/English and Asia had outsize losses.
  • Looking at categories of publishers (explained in the book), society and university journals dropped marginally more than average and traditional and multijournal OA publishers dropped substantially less (around 17% in both cases); the biggest losses are among “miscellaneous publishers,” those with only one or two journals.
  • 34 countries had no losses, although that’s only 97 journals.
  • The highest journal losses (by number) come from the United States, Brazil, India, Spain and Turkey–but the highest article losses come from India (50% more than the U.S.), the United States (more than twice Brazil’s number), Brazil, China, Turkey and Japan (Spain is 12th).
  • Percentagewise, among countries with a fair number of journals, Japan has the highest article loss. Looking at the five countries with the largest numbers of dropped journals, the U.S. lost 40% of journals claiming to be published here but only 19% of articles; Brazil lost 27% of journals and 17% of articles; India dropped 36% of journals and 42% of articles; Spain dropped 20% of journals and articles; and Turkey dropped 37% of journals and 47% of articles.

I suspect this will serve as a wake-up call for a fair number of university and society publishers and for publishers in some countries. In other cases…well, I see a baker’s dozen of publishers with 10 or more dropped titles (the largest is 45), and there are at least two or three of those that may not be missed.

Again, this is all preliminary off-the-cuff quickie subject to change comment. The book will be free (in PDF form) when it comes out, and that final chapter may be part of a C&I extended excerpt: those numbers should be better.

Cites & Insights 16:4 (May 2016) now available

April 26th, 2016

The May 2016 issue of Cites & Insights, volume 16 issue 4, is now available for downloading at

The issue is 13 pages long. If you’re reading it online or on a tablet, you may prefer the one-column 6″x9″ edition at That version is 26 pages long (and lacks one extraneous paragraph).

The short but meaty issue includes:

The Front  p. 1

Why it’s short.

Intersections: Two Worlds of Gold OA: APCLand and OAWorld pp. 2-5

A preview of some key data from Gold Open Access Journals 2011-2015, offered partly because I believe it is a new and useful way of looking at gold OA and am inviting feedback (fairly soon, since I’ll start on the book next week).

Policy: Google Books: The Final Chapter?  pp. 6-13

The Supreme Court won’t hear the Authors Guild appeal of the appeals court’s decision in Google’s favor. Maybe–maybe–the decade-long struggle is over. That’s worth a quick roundup of Google Books items since the last roundup.

Ideas for Gold Open Access Journals 2011-2015: Second Call

April 21st, 2016

If you have opinions on what was great or not so great in The Gold OA Landscape 2011-2014, or ideas on how the book-length analysis and presentation could be better for the new, much more complete Gold Open Access Journals 2011-2015, I’d like to hear from you–ideally before May 1, 2016 (I’ll start working on the book right around then). (Note that the PDF ebook version will be free and freely available with a CC BY license; the paperback will be priced at roughly production cost.)

Which tables and graphs seem especially worthwhile? Which writeups were more or less informative?

Since most of you haven’t seen the full book, there are two resources to base your feedback on:

  • The October 2015 Cites & Insights includes about half the text and around half the tables from the book, but none of the graphs.
  • The April 2016 Cites & Insights includes an introductory essay but mostly consists of pages 39 through 73 of the book, chapters 5 through 9, showing exactly what’s in the book.

(Note: you can reasonably ignore the “Why Anonymize?” section of the introductory essay in that issue: in consultation with SPARC, I’ve decided to make the non-anonymized spreadsheet openly available when the analysis is complete, One very minor consequence of non-anonymity: seven small journals that I’d flagged as questionable for judgmental reasons are no longer flagged. That doesn’t affect the analysis at even 0.1% levels.)

Both links are to the 6×9″ “online” versions, which better reflect the book pages.

You can comment directly on this post (for a week or two) or, better yet, send email to I don’t promise to use your suggestions; I do promise to think about them seriously.

(I’ll be asking for feedback on one very new and fairly distinctive aspect of the 2011-2015 survey, which arose from a decision to look at countries by region–but I’ll have more to say about that next week, I think, in a blog post and as part of a short Cites & Insights.)


Minor updates:

  • If you’re following my recovery from surgery (excision of a Schwannoma, a benign nerv sheath tumor): No, I’m not back to full touch typing; have begun hand therapy and ordered Dragon NaturallySpeaking. Posting and C&I still much reduced and the textual portions of the book may be more concise than otherwise–which could be a good thing.
  • I’ve completed the second data-gathering pass for the 2011-2015 project. The number of fully-analyzed “good” journals is up from 9,512 to 10,324, and the rough estimate of total articles from those journals for 2015 is around 566,000.
  • Yes, there will be a Cites & Insights soon, probably before May 1; no, it probably won’t be very long, given the difficulties of six-fingered typing…

Warriors Classic 50 Movies, Disc 1

April 5th, 2016

Fifty movies about an Oakland basketball team: who woulda thunk it? OK, so they’re really “sword and sandals” movies—all those Hercules, Son of Hercules, Colossus, Ursus and similar pictures, strong on Legendary Heroes, usually strong on magic and gods/goddesses, with lots of wholly innocent beefcake and (usually) cheesecake, usually some humor along with lots of fighting, loads of scenery, surprisingly good production values and plots that don’t always make much sense. Oh, and really bad dubbing, except sometimes for the one or two American actors. These are fun movies, mostly Italian, and I grade them within their own realm: a really great sword-and-sandals flick might not be a classic in traditional Hollywood terms. It’s a thirteen-disc set (there aren’t many hour-long sword-and-sandals flicks); Part 1 covers discs 1-6.

Hercules and the Masked Rider (orig. Golia e il cavaliere mascherato), 1963, color. Piero Pierotti (dir.), Alan Steel (that is, Sergio Ciani), Mimmo Palmara, José Greci, Pilar Cansino, Arturo Dominici. 1:26 [1:23]

Who knew that Hercules (“Alan Steel”) was not only a demigod but a time traveler? In this flick (clearly shot in widescreen and panned-and-scanned, more’s the pity), he’s jumped from the second century BC to the 16th century CE, since there are at least two handguns along with the many swords—and he’s somehow riding with a band of gypsies in Spain. (According to the source of all knowledge, this character was Goliath in the Italian original, but that still involves time travel, albeit only 16 rather than 18+ centuries—and Goliath wasn’t an immortal demigod. Hey, it’s swords-and-sandal magic!)

This means that—other than Hercules, who seems allergic to shirts, and a few of the evil Don’s soldiers who wind up naked after being humiliated by the gypsies and Hercules—everybody’s fully clothed, from head to toe. (Even Hercules has a shirt on for maybe three minutes total.) It also means that there are no gods & goddesses, no magic (although the Evil Don would happily burn the head gypsy as a witch), just lots of plot.

Plot. Hard to say whether it’s ever worth describing the plot in these spectaculars, but here it’s two Dons with their lands on either side of a river—and the Don on one side is pure evil, just loving to hunt down innocent peasants trying to escape from forced labor and really loving the occasional torture opportunity. The other Don is aging, has a beautiful daughter, and is unwilling to risk war with the evil Don—to the extent that he’s willing to marry his daughter off to the evil Don in the thought that this might prevent war. Foolish (and soon dead) man! Meanwhile, the aged Don’s nephew, the actual love of the daughter (well, why not? they’re first cousins, but it’s 16h century Spain), has returned from battle (after meeting up with the gypsies, fighting Hercules to a draw in a one-hour contest that earns him not only his life but the welcome of the gypsies), and thinks this is all a terrible idea. He becomes the Masked Rider and…

Lots’o’plot ensues, and of course things all work out in the end. (Hercules isn’t really the primary character, but here’s there now and then. Some reviewers compared the real protagonist, the cousin, to Zorro: that’s not too far off.) And, you know, even though the premise is even more bizarre than usual, it’s fun. Good score, pretty good print. I’ll give it $1.50.

Spartacus and the Ten Gladiators (orig. Gli invincibili dieci gladiatori), 1964, color. Nick Nostro (dir.), Dan Vadis, Helga Line, Ivano Staccioli/John Heston, Alfredo Varelli/John Warrell Ursula Davis, Giuliano Dell’Ovo/Julian Dower. 1:39

What this movie has in common with the previous one: in both cases, the titular character is not the major protagonist—Spartacus is there for maybe a third of the picture, and the biggest of the ten gladiators (who in this case aren’t slaves but entertainer/warriors) is the protagonist (and, in the end, rides away with The Girl).

Otherwise: set in Roman times, with the Ten Gladiators blackballed by the primary entrepreneur (because the big one almost spears a Roman senator instead of killing the winner of a 12-person to-the-death battle who refused to kill his father, one of the others) saving a senator’s daughter from Bad Thieves and being recruited by the senator to find and kill (they prefer capture) Spartacus, who is supposedly thieving. They find and meet Spartacus (involving an apparently hours-long battle between the big guy and Spartacus, ending with both of them collapsed and laughing) and join to his cause—which is, mostly, to take his group back to Thrace and freedom.

The gladiators say they’ll go back and try to sell that to the senator (with the promise that he’ll be sent ransom money for the group later)…who says “sure, why not?” and drugs them over dinner, putting them in the dungeon.

There’s more plot—and, other than the sheer stupidity of the gladiators and the apparent deal that knocking an enemy out means he’s out of the action forever, it’s not as implausible as you might expect—ending with a reasonably satisfactory conclusion. The overall lesson: if the venal, vicious Senator Varro had let a hundred or so slaves escape, he would have avoided destroying a major part of the Roman army—and dying in the process. But, you know, power demands respect, especially wholly corrupt power.

Lots of fights, of course, with swords but the good guys prefer punching the other guys out; very little blood shown; some humor; the gladiators almost never wear anything above the waist or more than a foot or so below, if that matters; and the kind of production values (thousands of extras, huge battle scenes) you expect from these movies. I was particularly taken with one plot point: the gladiators, trying to figure out how to free the slaves held in a compound that combines mining with aqueduct-building, capture a blacksmith and convert him to the cause by noting that, if they free the slaves, there will be thousands of chains and handcuffs that he can melt down and make into shields and the like. He winds up being one of the foremost warriors in the grand battle.

Excellent print, great production values, but a narrow view of a wide-screen movie. Still, another $1.50.

The Conqueror of the Orient (orig. Il conquistatore dell’Oriente), 1960, color. Tanio Boccia (dir.), Rik Battaglia, Irene Tunc, Paul Muller. 1:26 [1:14]

The story of Dakar, an Evil Usurper who’s murdered the king (or sultan) and seized the throne, with an army that seems to go around burning villages for fun (which makes it difficult to provide the required tributes), and along the way found a beautiful young woman, Fatima, who Dakar would make the first of his many wives. We’re also introduced to a young fisherman, Nadir, (trawling in the river) and his elder. A bit later, Fatima escapes and is next found floating in a little boat about to hit rapids—and, of course, Nadir rescues her. (Perhaps the name “Nadir” is a clue as to the quality of this flick.)

One thing leads to another, Fatima is recaptured, the fisherman vows vengeance, and of course we learn that he’s the legitimate heir to the throne—and after lots of talk, more talk, some really bad scimitar-fights, and the like, he slays the usurper and brings eternal peace to his kingdom.

Pretty bad. The English-language scriptwriter appears to have had English as a third language (at one point, having been captured, our hero is left behind bars “until thirst and famine shall end his life.” Famine? Really? The production values are at best OK, the plot makes little sense. Maybe the missing 12 minutes would help; probably not. Charitably, $0.75.

The Last of the Vikings, 1961, color. Giacomo Gentilomo (dir.), Cameron Mitchell, Edmond Purdom, Isabelle Corey. 1:43.

“Prince Harald needs more wood!” That cry as hundreds of trees are being felled by wholly inept axe-wielders is probably the best dialogue in this mess. We also learn that Vikings fight by waving axes around a lot, that axes defeat bows and arrows even at long range, that some kings are hand-rubbing gibbering incarnations while princes just laugh a lot…and that perfidy runs deep in Norway.

As to the plot and acting and scenery…well, this was the first old flick I’d watched in almost three months (the DOAJ project was more fun); I was watching it the day after surgery; I was on low-dose opioids,,,without all of which I might not have made it all the way through. Maybe, charitably, $0.75.

Recovery: a short, slow post

March 31st, 2016

Since I’ve left notes elsewhere saying I’m mostly offline for the next [1:n] days [where n is indeterminate], I thought a little more detail might be in order:

  • The surgery: removing a Schwannoma (a benign nerve sheath tumor) from my right forearm–a visible bump perhaps 1.2″ long and 1.3″(?) high, determined to be benign by a January needle biopsy-which also irritated the lump and caused it to grow.
  • When? Tuesday, March 29, around 3:30 pm Stanford Hospital, Dr. David G. Mohler (who did a great job).
  • Pain? Not bad: of the allowed 2-pills-each-6-hours allowed, I needed 1 pill Tuesday afternoon, 1 at bedtime, 1 Wednesday a,m.(10 hrs later) and, since then 1/2 pill every eight hours, Good chance I’ll stop altogether tomorrow. (OTOH, my metabolism appears to be tough on drugs: the whole-arm nerve block, intended to last 8-12 hours, lasted about 3.5 hours. General anesthesia not wanted or needed,)
  • Problems? Maybe just reality: after trauma to the tendons and muscles and nerves in the arm, my fingers aren’t back to normal. (But gripping, etc. is pretty much OK.)

So I mostly need to let my right arm rest until the swelling goes down. I’ve seen how hard it is to work online without instinctively using both hands. So I’m mostly staying off. Two fingers are starting to come back to semi-normal; the rest could take a day, or three, or a week.

Otherwise? There’s leeway enough in The Big Project; I’m feeling good enough that I went for the daily walk around the 1,3-mile block with my wife today.

Thanks for the expressions of concern

Making the case (a follow-up post)

March 26th, 2016

A while back, I wrote a post explaining why the dataset for Gold Open Access Journals 2011-2015 will not include journal names and publishers, and invited people to send me email explaining possible positive use cases if that decision was changed.

I’ve received one such email so far, resulting in an exchange of email; I’ve saved it for later consideration.

Meanwhile, a tweetstorm has erupted that seems to say that my work is useless if I don’t provide the full data. Apparently the other post is too long to read (or didn’t get read), so here’s a slightly different and shorter version–but you still need to read the other post before you respond.

  • If somebody attempted to replicate the research starting in, say, July 2016, the results will be different for some significant number of journals, for several reasons (some of them having to do with what gets counted, some of them having to do with delays in posting, some because journals that yield 404s in March may not in July or vice-versa).
  • Somebody out to snipe or discredit will also look at individual journals and disagree with my choice of which of 28 broad subjects to assign it to; in quite a few cases, more than one choice is reasonable.
  • I’m very interested in use cases–cases where useful additional research would be possible based on a non-anonymized spreadsheet. (In some such cases, the dataset will be made available to the group or person–I’ve already done that for the previous dataset.) If there are convincing cases, I’d talk to SPARC about whether it makes sense to open up the data completely. And hope that I don’t spend the rest of the year dealing with a stream of “But THIS NUMBER’S WRONG, so your whole study’s worthless” or “But THIS JOURNAL’S REALLY ABOUT X, so your whole study’s worthless” or variants of that.
  • Email (to calmly suggesting positive use cases will be dealt with politely and taken into account. Head-on attacks 140 characters at a time are, shall we say, less likely to persuade me. (Well, they might persuade me never to get involved in this kind of project again, so if that’s your motive…)
  • Oh, and by the way: This isn’t about hiding methodology. I’ve never done so, and don’t plan to start now.

I’ll be off the air entirely for several days beginning the evening of March 28, so email may not receive quick responses at that point. Meanwhile, I’d like to get back to getting something done.

In partial defense of Jeffrey Beall

March 25th, 2016

Not in defense of his lists, which I regard as a bad idea in theory and fatally flawed in practice, for reasons I’ve documented (most recently here but elsewhere over time).

But…I’ve seen some stuff on another blog lately that bothers me.

  • I do not for a minute believe that Jeffrey Beall wrote the supposed email I’ve seen that suggests a listed publisher would be re-evaluated for $5,000. That email was written using English-as-a-third-language grammar; it’s just not plausible as coming from Beall.
  • I truly dislike the notion that a doctorate is the minimum qualification for scholarship. But then, I would, wouldn’t I (since my pinnacle of academic achievement is a BA and a handful of credits toward an MA).
  • I also dislike the notion that state colleges are somehow disreputable. My own degree comes from a state institution, and I’ll match its credentials with anybody.

The same blog had an interesting fisking of one of Beall’s sillier anti-OA papers. I had tagged it toward a future Cites & Insights essay on access and ethics. But after seeing this other stuff…I won’t link to or source from this particular blog.  Heck, I’ve been the subject of Beall’s ad hominem attacks; doesn’t mean I have to support that sort of thing.

Cites & Insights 16:3 (April 2016) available

March 23rd, 2016

The April 2016 Cites & Insights (16:3) is now available for downloading at

That print-oriented version is 30 pages long. If you’re planning to read online or on an ereader, you may prefer the single-column 6″ x 9″ version, 59 pages long, available at

While much of this issue has appeared as a series of posts in this blog, the final section of the lead essay is new, as is the fourth essay; the final section reprints 35 pages of The Gold OA Landscape 2011-2014 to serve as context for a portion of the first essay.

This issue includes:

The Front: Gold Open Access Journals 2011-2015: A SPARC Project pp. 1-8

Remember the “watch this space” note in the February-March “The Front”? This is what it was about. This essay includes the key announcement, a partial list of changes from the 2011-2014 project, a partial checkpoint prepared when I was halfway through the first pass, a section asking for possible “changes for the better” in the analysis and writeup (note that this year’s PDF ebook will be free and OA, since it’s a SPARC-sponsored project), another section discussing the planned anonymization of the (free) spreadsheet when analysis is done–and, new to this issue, a second checkpoint prepared at the end of the first journal pass.

The Front (also): Readership Notes  pp. 8-9

Notes on the most frequently downloaded issues in Volume 15 and the most frequently downloaded issues overall.

Intersections: “Trust Me”: The Other Problem with Beall’s Lists  pp. 9-11

As far as I can tell, Jeffrey Beall provides no evidence whatsoever–not even his classic “this publisher has a funny name”–for seven out of eight journals and publishers on his 2016 lists. This piece, which has a little additional material beyond the original post, goes into some detail.

The Back  pp. 11-12

Not precisely filler to get an even number of pages, but…OK, so these three mini-rants are mostly filler to get an even number of pages.

The Gold OA Landscape 2011-2014, pp. 39-73   following page 12

I’m including chapters 5 (starting dates), 6 (country of publication), 7 (segments and subjects), 8 (biology and medicine) and 9 (biology) to provide more context for my invitation to suggest better ways to analyze and present the 2011-2015 data. Please note that these pages appear precisely as they would in the PDF ebook if you’re looking at the online 6″ x 9″ version (since the book’s 6″x9″), but are reduced very slightly for the print-oriented version (to 5.5″x8.5″) so that two book pages will fit on one printed page.

Next issue?

I did not label this the April-May 2016 issue. Whether there’s a May issue in late April or early May, or a May-June issue later in May, depends on a number of factors having mostly to do with Gold Open Access Journals 2011-2015.

Why Anonymize?

March 14th, 2016

The project plan for Gold Open Access Journals 2011-2015 calls for me to make an anonymized version of the master spreadsheet freely available—and as soon as the project was approved, I made an anonymized version of the 2014 spreadsheet available.

Two people raised the question “Why anonymized?”—why don’t I just post the spreadsheet including all data, instead of removing journal names, publishers and URLs and adding a simple numeric key to make rows unique?

The short answer is that doing so would shift the focus of the project from patterns and the overall state of gold OA to specifics, and lead to arguments as to whether the data was any good.

Maybe that’s all the answer that’s needed. Although I counted very little use of the 2014 spreadsheet in January and February 2016, it’s been used more than 900 times in the first half of March 2016—but I have received no more queries as to why it’s anonymized. For any analysis of patterns, of course, journal names don’t matter. But maybe a slightly longer answer is useful.

That longer answer begins with the likelihood that some folks would try to undermine the report’s findings by claiming that the data is full of errors—and the certainty that such folks could find “errors” in the data.

Am I being paranoid in suggesting that this would happen? Thanks to Kent Anderson, I can safely say that I’m not, since within a day or two of my posting the spreadsheet, he tweeted this:

Anderson didn’t say “Am I misunderstanding?” or “Clarification needed” or any alternative suggesting that more information was needed. No: he went directly on the attack with “Errors exist” (by completely misreading the dataset, as it happens: around 500 gold OA journals began publication, usually not as OA, between 1853 and 1994).

It’s not wrong, it’s just different

To paraphrase Ed and Patsy Bruce (they wrote the song, even though Willie Nelson and Waylon Jennings had the big hit with it)…

If somebody else—especially someone looking to “invalidate” this research—goes back to do new counts on some number of journal, they will probably get different numbers in a fair number of cases.

Why? Several reasons:

  • Inclusiveness: Which items in journals—and which journals—do you include? The 2014 count tended to be more exclusive when I had to count each article individually; the 2015 count tends to include all items subject to some form of review, including book reviews and case reports. Similarly, the 2015 report includes journals that consist of (reviewed) conference reports (although I’ll note the subset of such journals).
  • Shortcuts: I did not in fact look at each and every item in each and every issue of each and every journal, compare it to that journal’s own criteria for reviewed or peer-reviewed, and determine whether to include it. To do that, I’d estimate that a single year’s count would require at least 2,000 hours exclusive of determining APC existence and levels and all other overhead—and, of course, a five-year study would require four times that amount (fewer journals and articles in earlier years). That’s not plausible under any circumstances. Instead, I used every shortcut that I could: publication-date indexes or equivalent for SciELO, J-Stage, MDPI, Dove and several others; DOI numbers when it’s clear they’re assigned sequentially; numbered tables of contents; Find (Ctrl-F) counts for distinctive strings (e.g., “doi:” or “HTML”) after quick scans of the contents tables. For the latter, I did make rough adjustments for clear editorials and other overhead.
  • Estimates: In some cases—fewer in 2015 than in 2014, but still some—I had to estimate, as for instance when a journal with no other way of counting publishes hundreds of articles each year and maintains page numbering throughout a dozen issues. I might count the articles in one or two issues, determine an average article length, and estimate the year’s total count based on that length. I also used counts from DOAJ in many cases, when those counts were plausible based on manual sampling.
  • Errors: I’m certain that my counts are off by one or two in some cases; that happens.
  • Late additions: Some journals, especially those that are issue-oriented and still include print versions, post online articles very late. Even though I’m retesting all cases where the “final issue” of 2015 seemed to be missing when checked in January-March 2016, it’s nearly certain that somebody looking at some journals in, say, August 2016 will find more 2015 articles than I did.

In practice, I doubt that any two counts of a thousand or more OA journals will yield precisely the same totals. I’d guess that I’m very slightly overcounting articles in some journals that provide convenient annual totals—and undercounting articles in some journals that don’t.

For the analysis I’m doing, and for any analysis others are likely to do, these “errors” shouldn’t matter. If somebody claimed that overall numbers were 5% lower or 5% higher, my response would be that this is quite possible. I doubt that the differences in counts would be greater than that, at least for any aggregated data.

Making the case

If you believe I’m wrong—that there are real, serious, worthwhile research cases where only the unanonymized version will do—let me know (

Obviously, anonymized datasets aren’t unusual; I don’t know of any open science advocate who would seriously argue that medical data should be posted with patient names or that libraries should keep enough data to be able to do analysis such as “people who borrowed X also borrowed Y.” In practice, there may be special use cases for an open copy of the master spreadsheet. On the other hand, except for the list of journals flagged as having malware on their sites, I’ll be doing my analysis with the anonymized spreadsheet—it’s what’s needed for this work, and won’t distract me with individual journal titles and how I might feel about their publishers.