To some extent, I saw this one coming. The first question, answer, and expansion in OCA and GLP 1 wasn’t really designed to provoke, but I knew it might be considered provocative.

I thought it might serve as a test of reading comprehension: Would the Project Gutenberg supporter be able or willing to understand the distinctions I was making (between the plain text of a book and the pages of the book, for example) or would they just fulminate that I was demeaning PG and the Michael who made it all possible?

Turns out the first response came in two days ago–but it came in to, the special email address for those who verify up front that their comments can be published, and I don’t check that email address very often. (This is the first such submission…)

Two more arrived today.

Bruce Albrecht sent the first response, a long and thoughtful one. He begins:

I would like to take exception to the several places in the December
2005 edition of Cites and Insight
( where you dismiss the Project
Gutenberg as merely a library of e-texts as opposed to e-books, which
are clearly better.

In the lowest common denominator form, PG texts are, as you say, only
etexts. However, many, if not most of the new works contributed to PG
these days from Distributed Proofreaders also include a secondary HTML
version which include all the features of an e-book that Karen Coyle
claims work from PG lack.

He goes on to note an example and all of its features, explain what Distributed Proofreaders is doing, and question my association of typography and page design with the book itself as written by the author, as opposed to the particular edition.

Well, there aren’t “several places…where I dismiss” PG as merely a library of etexts; I only see one place. But never mind. I was mistaken.

I plead guilty: I had grown so sick of Michael Hart’s inflated ego, his wayward ways with facts and figures (particularly back in the late unlamented “Ask Dr. Internet” days), and other aspects of His Project that I hadn’t gone back to Project Gutenberg in a long time. Everything Hart writes continues to emphasize plain ASCII as what PG is all about. When I did visit the site, I still find that emphasis–although there’s a mention of other formats hidden near the end of a very long FAQ.

And, sure enough, if you start clicking on entries in the catalog, eventually you’ll wind up with some HTML offerings (even a PDF or two!).

Because there’s at least one PDF, the answer to my first question (“How many books has Project Gutenberg digitized and made available online?”) should not be “None” but “A few.” Further clarification: There are several thousand “ebooks” by definitions I’d agree with, namely the HTML versions, but only a few digitized books–that is, digital replications of book pages. End of further clarification 12/3/05. The general answer is correct, however: PG’s primary thrust as explicated endlessly by its founder continues to be etexts, not ebooks (and I would note that Hart would probably take offense at the first sentence in the second paragraph of Albrecht’s letter). But even HTML digitizes the text and organization of a book, not the edition itself. (Google’s public domain offerings, as currently planned, offer the digitized editions, but not in ebook form…)

Then there’s the issue of whether an ebook should be a digital facsimile of a print edition, as opposed to a properly-organized version of the work itself. In this case, there are good arguments to be made on several sides. For some purposes, the digital facsimile is superior; for many purposes, the HTML (or TEI, or whatever) version of the work is superior. I think it’s legitimate to call both of these ebooks.

So, to the extent that PG does now include proper HTML versions of works, I’ll say that there are ebooks on PG.

As to the other two pieces of mail:

  • In one case, I’m waiting for permission to publish, since the mail came in to The correspondent raises a similar issue in briefer form, and says “it’s not fair to represent PG’s content so inaccurately.” My best defense is that PG’s founder makes such a point of representing PG that way that it’s easy for mere mortals to get confused.
  • The other case includes a “response” from Michael Hart himself, and since it was posted to a list (and forwarded from that list), I don’t feel I need his permission to quote some excerpts. The problem is that he was responding to exceprts from the article (I assume, given the responses), which leads to some silliness. He seems to assume that I’m holding Google up as the paragon of ebook provision. But Hart’s derision when it comes to caring about typography and page design can’t be missed; apparently caring about anything related to print is “obsessive.” I’m not sure what, if anything, I’ll actually use from Hart’s stuff; it’s too easy to quote without comment, since he doesn’t need much rope…

There will certainly be feedback/followup in the next issue, maybe even a separate essay. I see one discussion possibility already (having nothing to do with pure ASCII)…

