The (Fuller) Open Access Landscape: Progress Report 1

While donations/book purchases to support this followon to The Open Access Landscape have so far been less than overwhelming, it’s a project that interests me (and would lay the groundwork for a 2016 revisit covering pretty much all of Gold OA for 2011-2015), so I’ve started–but don’t know whether or when I’ll finish.

Changes in Grading and Approach

On reflection, I’ve made some changes in grades (existing and future) and my approach. To wit:

  • A$ has been collapsed into A.
  • D and its various subgrades has been collapsed into A-with-subgrades, since “may not be included in DOAJ at some point” really isn’t what I’m looking at. The new subgrades of A include C (nothing since 2012 or explicitly ceased/merged), D (apparently dying, none in 2014), E&H (as in old D/E and D/H, and these are judgment calls), N (2014 only and fewer than 5), O–the only new subgrade, “Oneshot,” for a journal that’s only had articles in one year 2011-2013, and none in 2014, and S–the biggest group (small).
  • B now has subgrades as appropriate to indicate why something may need investigation: A (author repetition), E (problematic English in a journal that claims to be English or support English), G (garish or other site problems), I (questionable “impact factors” prominently featured), M (minimal information), P (peer review/turnaround/editorial issues), Q (questionable claims), T (questionable article titles on casual inspection, O (other–usually a mix of problems). In practice B journals frequently have more than one issue, and I note the first one encountered.
  • C now has subgrades as appropriate to indicate why I regard it as better to avoid: A (APC missing or hidden, the most frequent cause), E (English so bad–in an English-language journal–as to be unworkable), F (falsehoods on the site), P (implausible turnaround/peer review), S (incompetent site), T (absurd article titles or approach), O (other: usually a mix of problems).
  • X, excluded from study, combines old grades E-X and now has these subgrades: E (empty since at least 2010), M (either Malwarebytes, McAfee Site Advisor, Chrome defenses or Windows Security says the site has security issues–I’m tired of getting viruses from “OA journals”), N (not OA, but my definition’s looser these days–still, required registration or an embargo are deal-killers), O (opaque: undated issues or otherwise unable to count articles by year–and I do try DOAJ as well), P (parking page or other non-journal page), U (reachable but unusable site), X (unreachable, trying both Excel-to-Chrome and direct-in-Chrome search, but NOT title search: if a journal can’t be bothered to update its URL in DOAJ or provide a link, given that I’m using a June 8, 2015 DOAJ download, it’s effectively incompetent), and the new T (Chrome’s translation did not make it possible for me to evaluate the journal for APC, peer review, OA and article ¬†count)

Yes, I’m using Chrome as my default browser (although for many uses I prefer Firefox), for one simple reason: built-in page translation, so I can attempt to evaluate journals that don’t have English interface options.

Cleanup

I went through the spreadsheet used for the current set of reports, eliminating a handful of duplicates, changing grades to the new system, and revisiting journals where the 2013 count is less than half the 2014 and 2012 counts (in quite a few cases, these are annuals that show up very late, and I filled in the 2013 counts).

The cleaned-up base spreadsheet has 6,465 journals, including 5,533 A, 495 B, 397 C, and 40 X–that’s right, in the process of cleaning up 40 journals became unusable. Some journals changed grades because late-2014 articles moved them or because of other reasons. (The old E-X grades are not part of the base spreadsheet: sometimes journals come back to life, so I’m revisiting those.) Incidentally, of the 40 new Xs, 18 are for malware, and in 9 cases journal sites are now parking pages.

Download and crossmatch

I exported the DOAJ .CSV metadata on June 8, 2015. It included 10,611 journals.

  • Of those, five reported a 2015 start date. I eliminated those–this is still a 2011-2014 study–leaving 10,606.
  • I checked both URLs and titles for duplications. In 15 cases (29 journals), I made changes to disambiguate them (usually changing to an alternate URL for one title). In six cases (three journals), the duplication in URL was because the journal appeared under two titles (one English, one not); I eliminated the non-English duplicates. At this point, the DOAJ set included 10,603 journals.
  • Using Vlookup (with “false” to allow only exact matches), I matched URLs in the Base and DOAJ spreadsheets. I was delighted to find that 6,167 journals had exactly the same URL in June 2015 as in May 2014. I saved off the Base subset (all but 298) as Base_URL and deleted the DOAJ matching subset, leaving 4,436 journals.
  • Again using Vlookup, I matched journal titles in the remaining Base subset and remaining DOAJ subset. There were 191 matches. For these, I replaced the old URL with the new URL, saved the Base subset as Base_Title, and deleted the matches from the DOAJ subset, leaving 4,245 journals.
  • With only 108 Base journals left unmatched, it was reasonable to do visual title matches (the Base titles had been normalized in a way that could obscure some exact matches). This yielded 27 new matches, added to Base_Title (with the DOAJ title and URL), leaving a subset of 80 Base journals not found in the DOAJ download–and 4,218 DOAJ titles to investigate (presumably including many of the 800-odd titles graded E-X in the earlier study).
  • Combining the matched Base subsets yields a new Base_Curr of 6,305 journals, a Base_Nomatch remnant of 80 journals (I’ll look at those again when everything else is done–some probably failed new DOAJ criteria, some probably for other reasons), and 4,218 journals in the new DOAJ_P2 spreadsheet waiting to be checked.

Starting the slog

I’ve now checked the first 100 of the 4,218 (alphabetically by title). In one full day–with no yard work, no writing, nothing else–I managed 75 titles. At that rate, it would take 57 days to finish the scan, which I could comfortably do by my original September 14, 2015 target date.

But, of course, I rarely have full days: there’s yardwork (still 160-200 sq.ft of “grass” to remove in front, little by little, plus trips to get more rocks, plus actual weeding), there’s Cites & Insights unless I set it aside for the next three months, there’s hiking, there’s shopping, there’s lots of other things. My best guess is that I could average about three, maybe 3.5 “full day equivalents” per week–which makes this a 19-week project (or more). That takes me well into October, and maybe November. Unless I give up.

The first 100 are almost certainly not representative at all. How could they be?

For what it’s worth, there are 77 A (of which 6 have APCs*), six B, no C, and 17 X, and the A-C group published 2,405 articles in 2014.

I checked the first 100 against the beginning of the old E-X spreadsheet (all of which would be X in the new scheme). Five that were E-X are still X, while 14 are now A or B.

Continuing

I’ll keep going until I lose interest (or find that it’s really running too slowly) or I finish. I’m convinced this will yield an even more interesting look at gold OA 2011-2014 and a nearly complete look at the field (of the first 100, four came out XT–that is, Chrome/Google’s translate wasn’t enough for me to evaluate the journal–and two XO (obscure). That’s almost certainly not a meaningful sample, but if it was, I’d be happy enough.

If you’ve read this far you must find this research interesting and possibly worthwhile. The best way to encourage me to keep going is to contribute to Cites & Insights (at the link–the home page), noting that $25 gets you the PDF of the current study (and a $7 print book offer) and $50 will also get you the PDF of the more complete study, if I finish it.

Of course, in the hour I spent writing this post, I could have evaluated five to ten journals. Oh well.


*Updated 2 p.m. June 11, 2015:¬†Of the next 40 journals (101-140), 34 have APCs, just to show how meaningless a small sample is. Why so many–and so fast, as it happens? I hit “Advances in…” and most are new-in-2014 Hindawi journals, all with $600 APCs and all very easy to deal with. For that matter, Chrome/Google did just fine with the others, some of them in Chinese.

Comments are closed.