PPPPredatory Article Counts: An Investigation Part 3

November 11th, 2015

If you haven’t read Part 1 and Part 2—and, to be sure, Cites & Insights December 2015—none of this will make much sense.

What would happen if I replicated the sampling techniques actually used in the study (to the extent that I understand the article)?

I couldn’t precisely replicate the sampling. My working dataset had already been stripped of several thousand “journals” and quite a few “publishers,” and I took Beall’s lists a few months before Shen/Björk did. (In the end, the number of journals and “journals” in their study was less than 20% larger than in my earlier analysis, although there’s no way of knowing how many of those journals and “jour*nals” actually published anything. In any case, if the Shen/Björk numbers had been 20% or 25% larger than mine, I would have said “sounds reasonable” and let it go at that.)

For each tier in the Shen/Björk article, I took two samples, both using random techniques, and for all but Tier 4, I used two projection techniques—one based on the number of active true gold OA journals in the tier, one based on all journals in the tier. (For Tier 4, singleton journals, there’s not enough difference between the two to matter much.) In each tier, I used a sample size and technique that followed the description in the Shen/Björk article.

The results were interesting. Extreme differences between the lowest sample and the highest sample include 2014 article counts for Tier 2 (publishers with 10 to 99 journals), the largest group of journals and articles, where the high sample was 97,856 and the low—actually, in this case, the actual counted figure—was 46,770: that’s a 2.09 to 1 range. There’s also maximum revenue, where the high sample for Tier 2 was $30,327,882 while the low sample (once again the counted figure) was $9,574,648: a 3.17 to 1 range—in other words, a range wide enough to explain the difference between my figures and the Shen/Björk figures purely on the basis of sample deviation. (It could be worse: the 2013 projected revenue figures for Tier 2 range from a high of $41,630,771 to a low of $8,644,820, a range of 4.82 to 1! In this case, the actual sum was just a bit higher than the low sample, at $8,797,861.)

Once you add the tiers together, the extremes narrow somewhat. Table 7 shows the low, actual, and high total article projections, noting that the 2013, 2012, and 2011 low and high might not be the actual extremes (I took the lowest and highest 2014 figures for each tier, using the other figures from that sample.) It’s still a broad range for each year, but not quite as broad. (The actual numbers are higher than in earlier tables largely because journals in DOAJ had not been excluded at the time this dataset was captured.)

2014 2013 2012 2011
Low 134,980 130,931 92,020 45,605
Actual 135,294 115,698 85,601 54,545
High 208,325 172,371 136,256 84,282

Table 7. Article projections by year, stratified sample

The range for 2014 is 1.54 to 1: broad, but narrower than in the first two attempts. On the other hand, the range for maximum revenues is larger than in the first two attempts: 2.18 to 1 for 2014 and a very broad 2.46 to 1 for 2013, as in Table 8.

2014 2013
Low $30,651,963 $29,145,954
Actual $37,375,352 $34,460,968
High $66,945,855 $71,589,249

Table 8. Maximum revenue projections, stratified sample

Note that the high figures here are pretty close to those offered by Shen/Björk, whereas the high mark for projected article count is still less than half that suggested by Shen/Björk. (Note also that in Table 7, the actual counts for 2013 and 2012 are actually lower than the lowest combined samples!)

For the graphically inclined, Figure 4 shows the low, actual and high projections for the third sample. This graph is not comparable to the earlier ones, since the horizontal axis is years rather than samples.

Figure 4. Estimated article counts by year, stratified

It’s probably worth noting that, even after removing thousands of “journals” and quite a few publishers in earlier steps, it’s still the case that only 57% of the apparent journals were actual, active gold OA journals—a percentage ranging from 55% for Tier 1 publishers to 61% for Tier 3.


It does appear that, for projected articles, the stratified sampling methodology used by Shen/Björk may work better than using a pure random sample across all journals—but for projected revenues, it’s considerably worse.

This attempt could answer the revenue discrepancy, which in any case is a much smaller discrepancy (as noted, my average APC per article is considerably higher than Shen/Björk’s)—but it doesn’t fully explain the huge difference in article counts.

Overall Conclusions

I do not doubt that Shen/Björk followed sound statistical methodologies, which is quite different than agreeing that the Beall lists make a proper subject for study. The article didn’t identify the number of worthless articles or the amount spent on them; it attempted to identify the number of articles published by publishers Beall disapproved of in late summer 2014, which is an entirely different matter.

That set aside, how did the Shen/Björk sampling and my nearly-complete survey wind up so far apart? I see four likely reasons:

  • While Shen/Björk accounted for empty journals (but didn’t encounter as many as I did), they did not control for journals that have articles but are not gold OA journals. That makes a significant difference.
  • Sampling is not the same as counting, and the more heterogeneous the universe, the more that’s true. That explains most of the differences, I believe (on the revenue side, it can explain all of them).
  • The first two reasons, enhanced by two or three months’ of additional listings, combined to yield a much higher estimate of active journals than my survey: more than twice as many.
  • The second reason resulted in a much higher average number of articles per journal than in my survey (53 as compared to 36), which, combined with the doubled number of journals, neatly explains the huge difference in article counts.

The net result is that, while Shen/Björk carried out a plausible sampling project, the final numbers raise needless alarm about the extent of “bad” articles. Even if we accept that all articles in these projections are somehow defective, which I do not, the total of such articles in 2014 appears to be considerably less than one-third of the number of articles published in serious gold OA journals (that is, those in DOAJ)—not the “nearly as many” the study might lead one to assume.

No, I do not plan to do a followup survey of publishers and journals in the Beall lists. It’s tempting in some ways, but it’s not a good use of my time (or anybody else’s time, I suggest). A much better investigation of the lists would focus on three more fundamental issues:

  • Is each publisher on the primary list so fundamentally flawed that every journal in its list should be regarded as ppppredatory?
  • Is each journal on the standalone-journal list actually ppppredatory?
  • In both cases, has Beall made a clear and cogent case for such labeling?

The first two issues are far beyond my ken; as to th first, there’s a huge difference between a publisher having some bad journals and it making sense to dismiss all of that publisher’s journals. (See my longer PPPPredatory piece for a discussion of that.)

Then there’s that final bullet…

[In closing: for this and the last three posts—yes, including the Gunslingers one—may I once again say how nice Word’s post-to-blog feature is:? It’s a template in Word 2013, but it works the same way, and works very well.]

PPPPredatory Article Counts: An Investigation Part 2

November 9th, 2015

If you haven’t already done so, please read Part 1—otherwise, this second part of an eventual C&I article may not make much sense.

Second Attempt: Untrimmed List

The first five samples in Part 1 showed that even a 20% sample could yield extreme results over a heterogeneous universe, especially if the randomization was less than ideal.

Given that the most obvious explanation for the data discrepancies is sampling, I thought it might be worth doing a second set of samples, this time each one being a considerably smaller portion of the universe. I decided to use the same sample size as in the Shen/Björk study, 613 journals—and this time the universe was the full figshare dataset Crawford, Walt (2015): Open Access Journals 2014, Beall-list (not in DOAJ) subset. figshare. I assigned RAND() on each row, froze the results, then sorted by that column. Each sample was 613 journals; I took 11 samples (leaving 205 journals unsampled but included in the total figures). I adjusted the multipliers.

More than half of the rows in the full dataset have no articles (and no revenue). You could reasonably expect extremely varied results—e.g., it wouldn’t be improbable for a sample to consist entirely of no-article journals or of all journals with articles (thus yielding numbers more than twice as high as one might expect).

In this case, the results have a “dog that did not bark in the night” feel to them. Table 3 shows the 11 sample projections and the total article counts.

Sample 2014 2013 2012 2011
6 88,165 72,034 40,801 20,473
10 91,186 75,025 50,820 31,523
5 95,338 93,886 56,047 27,893
4 97,313 80,978 51,343 36,039
1 99,956 97,153 83,606 52,983
2 105,967 87,468 50,617 20,880
7 106,693 72,658 40,119 29,055
Total 121,311 99,994 64,325 34,543
9 127,747 100,653 73,326 32,075
3 140,292 122,128 77,958 36,634
8 154,754 114,360 79,323 35,632
11 160,591 143,312 91,011 53,579

Table 3. Article projections by year, 9% samples

Although these are much smaller samples (percentagewise) over a much more heterogeneous dataset, the range of results is, while certainly wider than for samples 6-10 in the first attempt, not dramatically so. Figure 3 shows the same data in graphic form (using the same formatting as Figure 1 for easy comparison).

Figure 3. Estimated article counts by year, 9% sample

The maximum revenue samples show a slightly wider range than the article count projections: 2.01 to one, as compared to 1.82 to 1. That’s still a fairly narrow range. Table 4 shows the figures, with samples in the same order as for article projections (Table 3).

Sample 2014 2013
6 $27,904,972 $24,277,062
10 $32,666,922 $27,451,802
5 $19,479,393 $20,980,689
4 $24,975,329 $25,507,720
1 $30,434,762 $30,221,463
2 $30,793,406 $25,461,851
7 $30,725,482 $21,497,760
Total $31,863,087 $28,537,554
9 $29,642,696 $24,386,137
3 $39,104,335 $41,415,454
8 $36,654,201 $29,382,149
11 $35,420,001 $34,710,583

Table 4. Estimated Maximum Revenue, 9% samples

As with maximum revenue, so with cost per article: a broader range than for the last five samples (and total) in the first attempt, but a fairly narrow range, at 1.75 to 1, as shown in Table 5.

Sample 2014 2013
6 $316.51 $337.02
10 $358.25 $365.90
5 $204.32 $223.47
4 $256.65 $315.00
1 $304.48 $311.07
2 $290.59 $291.10
7 $287.98 $295.88
Total $262.66 $285.39
9 $232.04 $242.28
3 $278.73 $339.12
8 $236.85 $256.93
11 $220.56 $242.20

Table 5. APC per article, 9% samples and total

Rather than providing redundant graphs, I’ll provide one more table: the average (mean) articles per journal (ignoring empty journals), in Table 6.

Sample 2014 2013 2012 2011
6 27.85 20.59 20.66 16.79
10 29.35 20.75 22.73 23.10
1 30.06 25.54 38.13 38.41
5 30.26 27.63 27.18 20.88
4 31.46 22.86 23.42 29.90
2 33.94 24.79 25.08 15.14
7 34.66 20.68 20.17 22.48
Total 36.80 27.47 30.08 25.51
3 42.01 34.90 38.63 27.13
9 42.10 29.75 35.82 26.30
8 43.86 31.25 38.20 26.39
11 47.88 40.12 47.13 38.04

Table 6. Average articles per journal, 9% samples

Note that Table 6 is arranged from lowest average in 2014 to highest average; the rows are not (quite) in the same order as in Tables 3-5. The range here is 1.72 to 1, an even narrower range. On the other hand, sample 11 does show an average articles per journal figure that’s not much below the Shen/Björk estimate.

One More Try

What would happen if I assigned a new random number (again using RAND()) in each row and reran the eleven samples?

The results do begin to suggest that the difference between my nearly-full survey and the Shen/Björk study could be due to sample variation. To wit, this time the article totals range from 64,933 to 169,739, a range of 2.61 to 1. The lowest figure is less than half the actual figure, so it’s not entirely implausible that a sample could yield a number three times as high.

The total revenue range is also wider, from $22.7 million to $41.3 million, a range of 1.82 to 1. It’s still a stretch to get to $74 million, but not as much of a stretch. And in this set of samples, the cost per article ranges from $169.22 to $402.89, a range of 2.38 to 1. I should also note that at least one sample shows a mean articles-per-journal figure of 51.5, essentially identical to the Shen/Björk figure, and that $169.22 is similar to the Shen/Björk figure.


Sampling variation with 9% samples could yield numbers as far from the full-survey numbers as those in the Shen/Björk article, although for total article count it’s still a pretty big stretch.

But that article was using closer to 5% samples—and they weren’t actually random samples. Could that explain the differences?

[More to come? Maybe, maybe not.]

PPPPredatory Article Counts: An Investigation, Part 1

November 9th, 2015

If you read all the way through the December 2015 essay Ethics and Access 2015 (and if you didn’t, you really should!), you may remember a trio of items in The Lists! section relating to “‘Predatory’ open access: a longitudinal study of article volumes and market characteristics” (by Cenyu Shen and Bo-Christer Björk in BMC Medicine). Briefly, the two scholars took Beall’s lists, looked at 613 journals out of nearly 12,000, and concluded that “predatory” journals published 420,000 articles in 2014, a “stunning” increase from 50,000 articles in 2010—and that there were around 8,000 “active” journals that seemed to meet Jeffrey Beall’s criteria for being PPPPredatory (I’m using the short form).

I was indeed stunned by the article—because I had completed a complete survey of the Beall lists and found far fewer articles: less than half as many. Indeed, I didn’t think there were anywhere near 8,000 active journals either—if “active” means “actually publishing Gold OA articles” I’d put the number at roughly half that.

The authors admitted that the article estimate was just that—that it could be off by as much as 90,000. Of course, news reports didn’t focus on that: they focused on the Big Number.

Lars Bjørnshauge at DOAJ questioned the numbers and, in commenting on one report, quoted some of my own work. I looked at that work more carefully and concluded that a good estimate for 2014 was around 135,000 articles, or less than one-third of the Shen/Björk number—and my estimate was based on a nearly 100% actual count, not an estimate from around 6% of the journals.

As you may also remember, Björk dismissed these full-survey numbers with this statement:

“Our research has been carefully done using standard scientific techniques and has been peer reviewed by three substance editors and a statistical editor. We have no wish to engage in a possibly heated discussion within the OA community, particularly around the controversial subject of Beall’s list. Others are free to comment on our article and publish alternative results, we have explained our methods and reasoning quite carefully in the article itself and leave it there.”

I found that response unsatisfying (and find that I’ll approach Björk’s work with a much more jaundiced eye in the future). As I expected, the small-sample report continued (continues?) to get wider publicity, while my near-complete survey got very little.

The situation continued to bother me, because I don’t doubt that the authors did follow appropriate methodology and wonder how the results could be so wrong. How could they come up with more than twice as many active OA PPPPredatory journals and more than three times as many articles?

So I thought I’d look at my own work a little more, to see whether sampling could account for the wild deviation.

First Attempt: The Trimmed List

I began by taking my own copy of Crawford, Walt (2015): Open Access Journals 2014, Beall-list (not in DOAJ) subset. figshare. The keys on each row of that 6,948-row spreadsheet are designed to be random. The spreadsheet includes not only the active Gold OA journals but also 3,673 others, to wit:

  • 2,045 that had not published any articles between 2011 and 2014, including eight that had explicitly ceased.
  • 183 that were hybrid journals, not gold OA.
  • 413 that weren’t really OA by my standards.
  • 279 that were difficult to count (more on those later).
  • 753 that were either unreachable or wholly unworkable.

There were two additional exclusions: I deleted around 1,100 journals (at least 300 of them empty ) from publishers that wouldn’t provide hyperlinked lists of their journal titles—and I deleted journals that are in DOAJ because there were even more reasons than usual to doubt the PPPPredatory label. (Note that the biggest group of that double-listed category, MDPI, has more recently been removed from Beall’s list.)

I wound up with 3,275 active gold OA journals, what I’ll call “secondary OA journals,” since I think of the DOAJ members as “serious OA journals” and don’t have a good alternative term.

As I started reworking the numbers, I thought there should be some accounting for the opaque publishers and journals. In practice, I knew from some extended sampling that most journals from opaque publishers were either empty or very small—and my sense is that most opaque journals (usually opaque because there are no online tables of contents, only downloadable PDF issues, but sometimes because there really aren’t streams of articles as such) are also fairly small. But still, they should be included. Since these two groups (excluding the 300-odd journals from opaque publishers that I knew were empty) added up to 32% of the count of active journals, I multiplied article and revenue counts by 1.32. (I think this is too high, but feel it’s better to err on the side that will get closer to the Shen/Björk numbers.)

I did not factor in the DOAJ-included numbers, but the total of those and other already-counted additional articles (doubling 2014 since I only counted January-June) is around 43,000 for 2014; around 39,000 for 2013; around 37,000 for 2012; and around 28,000 for 2011. You can add them to the counts below if you wish—although I don’t believe these represent questionable articles.


Since 613 was the sample size in the Shen/Björk article, I took a similar size sample as a starting point, then adjusted it so I could take five samples that would, among them, include everything: that is, a sample size of 655 journals.

For each sample (sorting by the pseudorandom key, then starting from the beginning and working my way down), I took the article count for each year, multiplying by appropriate factors, and the revenue counts for 2013 and 2014 (determined by multiplying the 2014 APC by the annual article counts, then applying the appropriate multipliers—I didn’t go back before 2013 because APCs were too likely to have changed). I calculated average APC per article for 2014 and 2013 by straight division—and also calculated the average article count (not including zero-count journals because the cells were blank rather than zero) and median article count (also excluding zero-count journals). I also calculated standard deviation just for amusement.

“Zero-count journals? Didn’t you eliminate zero-count journals?” I eliminated journals that had no articles in any year 2011-2014, but quite a few journals have articles in some years and not in others—including, of course, newish journals. For example, there were only 2,393 journals with articles in the first half of 2014; 2,714 in 2013; 1,557 in 2012 and 996 in 2011.

I also calculated the same figures for the full set.

Looking at the results, I was a little startled by the wide range, given that these samples were 20% of the whole: the 2014 projected article totals (doubling actual article counts, of course) ranged from 5,755 to 180,229! Now, of course, even that highest count is still much less than half of the Shen/Björk count—and just a bit over half if you add in the DOAJ-listed count.

So I added another column and assigned a random number to each row, using Excel’s RAND function, then froze the results and took a new set of five samples. The results were much narrower in range: 99,713 to 136,660. The actual total: 121,311 (including the 1.32 multiplier but not DOAJ numbers).

Table 1 shows the projected (or actual) article totals year-by-year and sample-by-sample, sorted so the lowest 2014 projection appears first. Note that samples 1-5 use the assigned pseudorandom keys, while samples 6-10 use Excel RAND function for randomization. Clearly, the latter yields more plausible results.

Sample 2014 2013 2012 2011
4 5,755 21,734 15,959 10,223
5 91,067 85,734 66,594 51,473
8 99,713 84,797 55,209 33,733
7 115,368 91,964 57,664 27,595
Total 121,311 99,994 64,325 34,543
6 123,050 104,808 57,295 22,605
9 131,762 106,181 82,790 53,869
10 136,660 112,220 68,666 34,914
3 159,284 121,097 75,933 27,628
1 170,148 138,890 87,371 56,027
2 180,299 132,515 75,768 27,364

Table 1. Estimated article counts by year

Adding the 43,000-odd articles from DOAJ-listed journals would bring these totals (ignoring samples 1-5) to around 143,000 to around 180,000 articles, with the most likely value around 165,000 articles: more than one-third of the Shen/Björk estimate but a lot less than half.

Note that “120,000 plus or minus 25,000” as an estimate actually covers all five samples that used the RAND-function randomization. Figure 1 shows the same data as Table 1, but in graphic form.

Figure 1. Estimated article counts by year

How much revenue might those articles have brought in, and what’s the APC per article? Keeping the order of samples the same as for Table 1 and Figure 1, Table 2 and Figure 2 show the maximum revenue (not allowing for waivers and discounts).

Sample 2014 2013
4 $2,952,893 $10,473,269
5 $1,677,496 $3,322,988
8 $30,184,480 $23,906,771
7 $35,939,416 $35,825,909
Total $31,863,087 $28,537,554
6 $31,010,206 $27,926,897
9 $31,165,754 $29,071,218
10 $31,015,578 $25,956,975
3 $82,610,167 $65,930,614
1 $34,247,360 $32,892,328
2 $37,827,517 $30,068,570

Table 2. Estimated maximum revenue, 2014 and 2013

This time there are two extremely low figures and one extremely high figure—with samples 6 through 10 all within $4.1 million of the actual maximum figure (for 2014: for 2013, the deviation is $7.3 million). Compare the $31.86 million calculated costs here with the $74 million estimated by Shen/Björk: the full-survey number is less than half as much.

Figure 2 shows the same information in graphical form.

Figure 2. Estimated maximum revenue, 2014 and 2013

Looking at APC per article, we run into an anomaly: where the Shen/Björk estimate is $178 for 2014, the calculated average for the full survey is considerably higher, $262.66. The range of the ten samples is from a low of $18.42 to a high of $513.08, but the five “good” samples range from $226.95 to $302.71, a reasonably narrow range.

Finally, consider the mean (average) number of articles per journal in 2014, in journals that had articles. The Shen/Björk figure is around 50; my survey yields 36.8. In fact, I show only 327 journals with at least 25 articles in the first half of 2014 (and only 267 with at least 50 articles in all of 2013).

The median is even lower—12 articles, or six in the first half—and that’s not too surprising. The standard deviation in most years was at least twice the average: as usual, these journals are very heterogeneous. How heterogeneous? In the first half of 2014, three journals had more than 1,000 articles each (but fewer than 1,300); six more had at least 500 articles; 16 had 250 to 499 articles—but at the same time, only 819 of the total had at least 11 articles in the first half of 2014, and only 1,544 had at least five articles in those six months.


I could find no way to get from these samples to the Shen/Björk figures. Not even close. They show too many active journals by roughly a factor of two, too many articles by a factor of close to three, and too much revenue by a factor of two—and too many articles per journal as well.

[Part 1 of 2 or 3…]

Note: This and following posts will also appear, probably in somewhat revised form, in the January 2016 issue of Cites & Insights.

Gunslinger Classics Disc 12

November 7th, 2015

As usual for these 12-disc fifty-movie sets, one disc has six short movies: this one. These are all oaters, B-movie programmers of an hour or less, mostly low-budget short-plot flicks. Four with John Wayne; one each with Bob Steele and Crash Corrigan.

Texas Terror, 1935, b&w. Robert N. Bradbury(dir. & screenplay), John Wayne, Lucile Browne, LeRoyMason, Ferm Emmett, George Hays. 0:51.

Wayne’s the newly-elected sheriff. The man who pretty much raised him comes by the office, shows the wad of cash he’s withdrawn from Wells Fargo to restock his ranch now that his daughter’s coming home in a few months, notes that he’d tied his horse up behind Wells Fargo, and rides off. Almost immediately thereafter, three gunmen rob Wells Fargo; in chasing them, Wayne winds up in a shootout with results that make him believe (a) that he—Wayne—shot the old man (we know it was one of the gunmen) and (b) that the old man might have been one of the bandits, since they dumped the money bag and one wad of bills on his corpse. After the town (jury?) concludes that the old man
had to have been a bandit—after all, people saw him tie up his horse behind Wells Fargo—Wayne resigns his position, turning it back over to the old sheriff (George Hayes, not in the Gabby persona). Wayne goes off, grows a beard, and becomes…well, that’s not clear.

Lots’o’plot, much of it involving the daughter, and most of it makes just as much sense as the idea that Wayne wouldn’t mention during the court hearing that the old man had told him his horse was tied up where it was. But hey, if you like lots of riding, some shooting, and a band of friendly Indians saving the day, I guess it’s OK. Generously, $0.75.

Wildfire, 1945, color. Robert Tansey (dir.), Bob Steele, Sterling Holloway, John Miljan, Eddie Dean. 0:59

An unusual entry: late (1945) and in color, but still a one-hour flick with lots of riding, lots of shooting, a couple of good fights—and a singing cowboy (actually sheriff in this case, Eddie Dean) who gets the girl. The plot, not in the order it unfolds: a gang is rustling all the horses from ranches in one valley and blaming it on Wildfire, a wild stallion—and it turns out horse theft is a sideline: the motivation is for one gang member to buy up the ranches cheap, since he already has a contract to sell them to a big ranch for a big profit. Two itinerant horse-traders with a tendency to stay on the right side of the law wind up in the middle of this and expose it.

The color’s a little faded, but the whole thing’s good enough that I’d probably give it six bits—except for one thing: however they “digitized” this, at several points it looks like a projector losing its grip on film sprockets, losing chunks of the action and disrupting continuity. With that, it goes down to $0.50.

Paradise Canyon, 1935, b&w. Carl Pierson (dir.), John Wayne, Marion Burns, Reed Howes, Earle Hodgins, Gino Corrado, Yakima Canutt. 0:53.

John Wayne again, this time as a government agent sent to investigate counterfeit traffic that may be connected to a medicine show. (One person went to jail for ten years for counterfeiting, and may be running such a show.) He finds the show—which has a habit of leaving towns suddenly, either for not paying debts or because the proprietor tends to drink his own tonic, go to town, bust things up and not pay for them (his tonic is “90% alcohol,” which is 180 proof and should make it flammable). For that matter, he helps the show evade arrest by getting them across the Arizona/New Mexico border just ahead of the law, and joins the show as a sharpshooter.

The next town is a New Mexico/Mexico border town—and turns out the medicine show’s not really involved any more: instead, the counterfeiter, who framed the medicine man, is now operating out of a saloon on the Mexican side. One thing leads to another with lots of riding, lots of shooting and some true sharpshooting, and of course both the good guys winning and John Wayne getting the girl—with a mildly cute surprise ending.

The highlight is probably the medicine man’s pitch, a truly loopy piece of speechifying, including his assurance that he once knew a man without a tooth in his head…and that man became the best bass drum player he ever knew! All it takes is determination, and Doc Carter’s Famous Indian Remedy.

Not great, not terrible. Once again we have Yakima Canutt doing something more than trick riding—he’s the villain in the piece. (Wayne does not sing; the two singing entertainers in the medicine show are…well, that’s six minutes I’ll never get back again.) I’ll give it $0.75

The Lucky Texan, 1934, b&w. Robert N. Bradbury (dir. & writer), John Wayne, Barbara Sheldon, Lloyd Whitlock, George Hayes, Yakima Canutt. 0:55.

This time, John Wayne’s Jerry Mason just out of college and returned to the ranch of old geezer Jake Benson, who more or less brought him up—and finds that the ranch’s cattle have all been rustled, but Benson’s opening up a blacksmith shop in town. Wayne immediately starts working there, and an early customer’s horse had picked up a stone—a stone that, when Wayne looks at it, seems to have gold in it. (It must have been a thriving smithy, since the geezer refuses payment for dealing with the horse’s problem…) Oh, and Benson’s pretty young granddaughter’s about to finish college (thanks in part to the geezer’s monthly checks) and returning soon.

One thing leads to another, and we have Wayne and Benson (not a TV series, but it could be) getting really good pure gold out of the site where they figured the horse had been; when they go to sell it, the assayer pays them…and then notes to his sidekick that he now “owned” most of Benson’s cattle.

More plot; the villains trick the geezer into signing a deed to the ranch; the sheriff’s son shoots the banker in a holdup just after Benson pays off the loan for the blacksmith shop (and Benson seems like a likely culprit until John Wayne Saves the Day)…and more. As always, it all works out in the end, which involves the usual Wayne-and-the-girl wedding. No singing; lots of fist fights (with no phony sounds—lots of grunting, but not much more); oddly enough, although two men are shot (and two others are shot at), there’s not a single death in the movie. There is, on the other hand, Wayne surfing down a sluice riding on a tree branch—and a chase scene involving Hayes semi-driving a car (he’d never driven before) and the villains on a powered railway car, in an almost slapsticky sequence. (That long chase is also the only time in an old Western I’ve ever seen The Hero, Wayne in this case, jump from his horse to tackle the villain on his horse…and miss, tumbling down a hill.)

George Hayes gets to show his dramatic abilities pretending to be his sister (you’d have to see it—he’d played the lead in Charley’s Aunt many years before, and does a good job in drag), and although he now has Gabby Hayes’ intonation and look, he’s not playing the fool by any means, and not even the sidekick—after all, it’s his ranch and his blacksmith shop. Another one with Yakimah Canutt doing more than stunt riding (although he did plenty of that—apparently chasing himself at one point), once again playing a bad guy (something he was very good at). (I would note that many of the reviews at IMDB call George Hayes “Gabby” or “Gaby” Hayes—but he didn’t become Gabby Hayes until later on in his career.)

Maybe I’m getting soft as I near the end of this marathon, but this one seemed pretty good; I’ll give it $1.

Riders of the Whistling Skull, 1937, b&w. Mack V. Wright (dir.), Robert Livingston, Ray Corrigan, Max Terhune, Mary Russell, Roger Williams, Yakima Canutt, Fern Emmett, Chief Thundercloud. 0:58 [0:53]

A few archaeologists and a trio of cowboys known as The Three Mesquiteers are out to plunder a lost Indian city, or as they put it, rediscover it and recover all the golden treasure. A bunch of Native Americans don’t like this idea, and attempt to discourage them. One half-Native American, who passes himself off as one of the party, had previously kidnapped the father of the beautiful young (female) anthropologist and has been torturing him to reveal the location of the treasure.

Of course, this being a B Western from the 1930s, the plunderers are the heros and it’s a great thing that they manage to shoot at least half a dozen Native Americans and bury more of them under a wildly implausible collapse of half a mountain. Naturally, it all ends “well,” with the most handsome of the Mesquiteers getting the girl and an older and plainer woman (another sort-of archaeologist) getting the less handsome of the Mesquiteers. (In this one, Yakima Canutt plays the American Indian guide who’s in cahoots with the half-Native American.)

Reasonably well staged and with continuous action, but it’s also blatantly offensive. If you can ignore that, maybe $0.75.

Randy Rides Alone, 1934, b&w. Harry L. Fraser (dir.), John Wayne, Alberta Vaughn, George Hayes, Yakima Canutt, Earl Dwire. 0:53.

This cowboy riding along tops a ridge and spots the roof of a building—a halfway house saloon. He hears the honky-tonk piano and goes in…only to discover that everybody’s dead and the piano is a player piano. As he looks over the situation, including an open safe, the sheriff and his posse show up…and, naturally enough, arrest the cowboy. But we saw eyes moving in a painting on the wall…and after they’ve gone, a young woman steps out and inspects the scene.

Thus begins a story involving a hearing mute who runs a local store, the young woman breaking the cowboy out of jail so he can find the real killers, a gang hideaway for a gang run by…oh, let’s not give it all away. Lots of riding, a fistfight or two, some shooting, and of course all ends well. This time, George Hayes (not at all in the “Gabby” persona) plays the lead villain (and the—spoiler—mute shopkeeper) and Yakima Canutt plays the chief henchman.

The flick seems padded at 53 minutes, and Wayne is notable mostly for his young good looks. Generously, $0.75.

Double digits!

November 6th, 2015

I am delighted to say that The Gold OA Landscape 2011-2014 is now in the double digits, with two Ingram paperback sales and one Amazon paperback sale reported. (I’m guessing that I only see Ingram and Amazon numbers once a month. In terms of progress toward $ goals, three Ingram/Amazon sales equal one 1.3 Lulu sales, but I’m nonetheless delighted to see them.)

The balance still heavily favors print: ten paperback http://www.lulu.com/shop/walt-crawford/the-gold-oa-landscape-2011-2014/ebook/product-22353903.htmlcopies, two PDF site-licensed ebooks. (The ebooks are only available through Lulu because the global marketing channel will only accept ePub ebooks. Don’t ask me.)

Added a bit later: And thanks to worldcat.org, I see that five universities have the book–and that it’s available from Barnes & Noble as well. I think Ingram, B&N and Amazon are the totality of Lulu’s global marketing arrangements…

Linguistics, OA, $430 and $1,400–and a bit about The Gold OA Landscape 2011-2014

November 5th, 2015

I thought it might be interesting to glance at some existing gold OA journals at least partly devoted to linguistics in light of editorial goings-on at a notable subscription “hybrid” journal in the field.

This is a very incomplete group: it’s only journals I’d grouped into Language & Literature and that showed “linguis” somewhere within the DOAJ record (usually in the subject or keyword fields). That omits journals partly devoted to linguistics that fell into any number of other primary subject areas such as anthropology. But it’s a start…

The Basic Numbers

This group of journals consists of 275 journals (including only those graded “A” and “B” in The Gold OA Landscape 2011-2014). The journals published 5,954 articles in 2011; 6,725 in 2012; 6,973 in 2013; and a slight drop to 6,415 in 2014.

Article Processing Charges

Twelve of the 275 journals have article processing charges; the remaining 264 are funded through other means.

Those twelve journals did publish more articles per journal than the others: in total, 1,007 in 2011; 1,298 in 2012; 1,418 in 2013; and 1,493 in 2014.

APCs range from $37 to $600, but only one journal charged more than $400 and only three charged more than $300. (The only fee-charging journal with more than 200 articles in 2014 charged $40.)

The maximum paid for APCs in the twelve fee-charging journals in 2014 was $364,146; that comes out to a weighted average of $244 per article. (The average for all articles in these journals is $56.76.)

Grades and Fees

Of the 263 no-fee journals, 250 don’t have any obvious problems. Of the thirteen graded B, two have problematic English; three have garish sites or other site problems; one features a questionable impact factor; six have minimal information; one had other issues.

Of the dozen fee-charging journals, seven don’t have obvious problems. Of the five graded B (obviously a much higher percentage than for no-fee journals), one has a questionable impact factor and four make questionable claims–actually, the same questionable claim in all four cases: they claim to be Canadian but show no indication of significant Canadian editorial involvement.

Anyway…that’s a little information about a few existing gold OA journals that are at least partially devoted to linguistics.

The Gold OA Landscape 2011-2014: Language and Literature

Just a few notes in addition to what’s in the excerpted version–hoping this might encourage a few people and libraries to buy the paperback or site-licensed PDF, or find ways to help me continue this research.

  • Most journals in this field are small, even by the standards of humanities and social sciences: 350 published 18 articles or fewer in 2014, as compared to 91 with 19 to 30 articles, 51 with 31 to 50 articles, 24 with 51 to 120 articles…and eight journals with more than 120 articles in 2014. (Seven of those eight journals charge APCs–but the one that doesn’t published one-quarter of all the articles in the big eight journals.)
  • Journals in 55 countries published articles in 2014. Only one country–Brazil–accounted for 1,000 or more articles. United States and Canada followed (with more than 900 articles each–although that includes the Canadian journals that aren’t very Canadian). Spain was the only other country with more than 660 articles.

As always, there’s more in the book.

Quick status report: as of this morning (November 5, 2015):

  • At least 2,306 downloads of the Cites & Insights issue have happened
  • Seven copies of the book have been purchased, in addition to my own copy: Six paperback, one PDF ebook. That’s one copy for every three hundred downloads. [Note added November 6, 2015: PDF ebook sales have now doubled–another copy was purchased. Total sales are still single-digit, but it’s progress.]




Why I’m not joining AAAS (a silly little post)

November 4th, 2015

Once in a while–maybe twice a year, and only since we moved to Livermore–I get a shrink-wrapped copy of Science that’s perhaps a month old, with an envelope enclosed inviting me to join AAAS for the super-low introductory price of $99. (Note that “join AAAS” is pretty much synonymous with “subscribe to Science,” and the discount seems to be honoring my nonexistent status as a scientist.)

Wonder why this has only happened since we moved to Livermore? I’m sure it has nothing to do with being in a small city of 85,000 people that includes two major labs–Lawrence Livermore and Sandia–employing more than 10,000 scientists and support staff between them. Maybe it’s purely coincidental.

Anyway, it happened again this week. After looking at the offer, I recycled it…and kept the magazine to read. (You can call Science a journal if you wish; to me, it comes off as a serious science-oriented magazine that happens to include a few peer-reviewed papers.)

I recycle the offers for two reasons:

  • It offends me that I’m offered Science for $99, with a renewal price that wouldn’t be higher than $153 (and probably lower), while if my library wants to subscribe to the print edition, it will cost them $1,282. I don’t know of very many magazines with the effrontery to charge a library nearly nine times as much for a print magazine as they charge an individual, although for scholarly journals that may be typical. Or not.
  • The less serious reason: I love magazines. I love books. I love some TV and movies. I love doing stuff on the computer. If I took Science with its weekly schedule and fairly meaty content, I’d have to stop taking at least half of the other magazines I read or give up on books altogether. Not gonna happen. (If anyone wonders why I don’t subscribe to The New Yorker, just reread this bullet. Also one reason I didn’t renew The Economist, although in that case going from free-for-airline miles to $100 or so made the decision easy.)

No deeper message. Just a quick note.


Cites & Insights 15 now available in paperback

November 3rd, 2015

Cites & Insights 15: 2015 is now available as a 354-page 8.5″x11″ paperback, combining all eleven issues plus indices (exclusive to the book).

As usual, the price is $45.00 (of which roughly half goes to support Cites & Insights).

This year is especially strong on open access (including the most complete survey ever done of gold OA activity) but also includes major essays on the Google Book Project, books, social networks, fair use and more.

(If you buy it today or tomorrow, you can get free shipping by using oupon code USMAIL11–capitals do count and the last two characters are ones. The coupon code is good through November 4, 2015.)

As close as I’ll get to NaNoWriMo

November 2nd, 2015

Or, as I like to think of it, the Misspelled Robin Williams Memorial process…

Anyway, you could think of the December 2015 Cites & Insights as my NaNoWriMo with just tiny little deviations. After all, it is novel-length (as defined by the Science Fiction & Fantasy Writers of America, SFWA, as far as I know the only list of lengths for this sort of thing: if it’s over 40,000 words, it’s a novel), and it’s appearing in November.

The only little tiny deviations from NaNooNaNooNoWriMo:

  • The issue isn’t quite 50,000 words long–it’s 48,012.
  • It’s nonfiction.
  • Although it appears in November, I wrote it in October, and OctNonWriMo doesn’t exist. Yet.
  • A large portion of it isn’t my writing, it’s excerpts from other writing. (How large a portion? To my surprise, apparently less than half–deleting every quoted paragraph that’s not quoting me brings the word count down to 26,851 words.)

But hey, other than those four tiny quibbles…

In any case, it’s as close as I’m ever likely to get to NaNoWriMo.

Cites & Insights 15:11 (December 2015) available

November 2nd, 2015

The December 2015 issue of Cites & Insights (15:11) is now available for downloading at http://citesandinsights.info/civ15i11.pdf

This issue is 58 pages long. If you plan to read it online or on an ereader (ebook, tablet, whatever), you may prefer the single-column 6″ x 9″ edition, 111 pages long, at http://citesandinsights.info/civ15i11on.pdf

This issue contains one essay:

Intersections: Ethics and Access 2015  pp. 1-58

No weird old tricks for reducing belly fat, but 102 items worth reading in a baker’s dozen of subtopics related to ethics and access (open and otherwise)–and #25 may astonish you! Or not.

No, it’s really not a listicle–otherwise I’d have to find 102 ads and free (or plagiarized) illustrations. It’s a bigger-than-usual roundup, with just a little humor (and a few exclamation points–and one interrobang).