Archive for June, 2009

The dynamics of spam on a semi-dormant blog

Tuesday, June 30th, 2009

It’s time for a serious post. E.g., a careful analysis of patterns of spam attempts on a widely-read but essentially dormant blog.
The blog in question is now entitled “Walt, Even Randomer” and combines four years’ of Walt at Random archives with the occasional new post that isn’t right for the new home of Walt at Random–e.g., reviews of old movies, ALA schedules, pure copies of posts from other blogs (except for the announcements of new Cites & Insights issues, which do appear on both blogs).
The semi-dormant blog was averaging 3,000 page views per day when the active portion moved and has a Google Page Rank of 5 (sometimes 6, if the wind’s blowing in the right direction), so it’s a target for spammers, particularly link spammers. It also has Spam Karma 2, so very few spamments get through. (Notes also automatically disable six months after a post appears, since many link spam attempts are on very old posts–and I disabled linkbacks long ago, since the spam-to-signal ratio was just too high.)
The settings for Spam Karma 2 are severe enough that, once in a while, a legitimate comment gets moderated, so I try to check the spams before deleting them or letting them get deleted. (So far, that doesn’t seem to be either an issue or a possibility at ScienceBlogs–and one spam comment, a very clever one, did make it through to one post already.)

Anecdata

That’s what this is, to be sure–at best, anecdotal data or anecdata*. It has all the scientific rigor of talk radio.
That said, and (as regards the lead sentence) noting that I don’t do emoticons, here’s a few notes on the varieties of spam encountered over a brief study period.

Very informative!

Complimenting the blogger seems like one common way of ingratiating spam. For example:

I found walt.lishost.org very informative. The article is professionally written and I feel like the author knows the subject well. walt.lishost.org keep it that way.

This might work better if the domain name was the name of the blog, to be sure: “I found Walt at Random very informative” is a tad more convincing. Six payday loan companies offered this sentiment.

I Love the way you write

You can’t be too effusive. This comment continues “…thanks for posting.” You’re certainly welcome, even if you’re commenting on entirely bland announcements with no writing style at all.
Twentyfive people love the way I write–and, oddly enough, although each person has a different name and gmail account, there’s the same URL for all 25. (In the case of all 25, that URL is on a blacklist. On the other hand, the posts to which these comments were attached would tend to make me wonder just why my style was so admired–and why posts arrived in pairs, but with identical text.)

Other compliments and apparently specific questions

“What is captcha code? pls provide me captcha codes or plugin, thanks in advance.” Sorry, yet another payday loans company, but I don’t provide that service.
“hey .. way to go with this post .. i’ll need more tips tho so [remainder omitted]” Much as I’d love to help out a low-cost loans provider…
“good work, hope you make more related posts! will keep an eye on this blog ;)” Given the nature of the URL provided, you’re too busy eying sexcams, I’d think. (Two of these, different gmail accounts.)
“walt.lishost.org – da best. Keep it going! have a nice day” Some Russian company that can post within five seconds of reaching the form.
“I always enjoy finding a ‘good’ blog. Thanx and I’m going to add you to my RSS feed.” Another mystery “Flash Gordon” poster–and it may be worth noting that most of these were on LLN Highlights reposts.
“great stuff thx things make since now hehe good concept” – This one, linking to a supposed boot seller, starts to move over into the dada area…
“This is a fast loading page, do you know who the webhost is and if they are cheap?” Nah, yet another sex seller, the blog’s just there–that URL with “lishost.org” in it is meaningless.
“Hi, I love your work.” Concise, if from another sex seller (and on an oft-spammed post that should have no comments at all).

The dada element

I think most of the spamments fall into this category–text that’s hard to take seriously if you actually read it. Just a few examples, including only the first few words of what are sometimes lengthy (lengthy–typically around 2,400 characers) spamments:

  • “Stone happy rich source chemical formula…”
  • “Within the blew and terbinafine…”
  • “Parry con had agreed free circus…”
  • “Bill heard through this denavir cream…”
  • “Unless they destroying their altace photo…”
  • “They gave horrific implicatio chemi…”

Most of these also seem to link to a single URL or one of several related URLs. I lost count of how many there were–let’s just say dozens (scores, probably–more than half of all the spamments).
If someone was willing to accept all these comments, then filter out all the obvious spam words (drug names, etc.), you could make some interesting found poetry from the remnants. I can just see someone with a goatee and a beret, sitting in a smoky Berkeley cellar reciting the results…a few decades ago.

Flat-out spam. Deal with it.

These are the comments that start with a link and are, in essence, nothing but links. Some include long lists of links (43 seems to be typical), some only a few. In a way, they’re the most pathetic form–easiest to block and obviously spam. Only about half a dozen of these, once all the dada-found poetry entries are eliminated.
Ah, but there are three variations:

  • “x nude” followed by URLs (where “x” can be some surprising names): Half a dozen.
  • Some nonsense word (eldbberyj, tdofnnkw, kxxlxhud…) followed by URLs: Another half dozen.
  • “x” Sex Tape, or just “x” followed by URLs (where “x” can again be a little odd): Only five of those.

The rest…

What else? There’s a long, long story about a kid and his computer; I saw that one three or four times. There’s a string of nonsense characters followed by “Comment 1” or “Comment 3” or “Comment 5” or whatever–apparently testing to see whether anything makes it through. (If you’re doing blog searches for the result, well, sorry, Charlie, it didn’t and won’t.)

Serious conclusions

  1. Spam is a damn nuisance. In four short years, a blog with modest readership in a narrow area had more than 31,000 spam attempts…and counting.
  2. Spammers are remarkably amateurish. Even the social-engineering spams were so badly done as to be laughable. If you’re going to flatter me about my writing, at least choose a post with some vague evidence that I actually wrote something!
  3. It must work somewhere! If spamments weren’t improving link scores and Google page ranks, they would disappear.

Now, back to skimming each day’s set and sending them off to perdition…


*Updated 7/2/09: While I don’t remember ever hearing “anecdata” before, I had no reason to believe it was original. It isn’t, as a belated search shows. Some usages are similar to mine; some, unfortunately, seem to suggest the legitimacy of treating several anecdotes as being data. Sad, that.

Fading away? (More metablogging)

Sunday, June 28th, 2009

In the past few days, one of the best libloggers called it quits: She explicitly said there won’t be any more posts on that blog.
By itself, while it’s noteworthy, I probably wouldn’t post about it. The writer isn’t going away, the archives aren’t going away, and the circumstances may be unusual.
But there’s a context that might be worth discussing and pursuing further–actually two contexts, one only marginally related.

Direct context

One comment on this shutdown said that, according to the writer and a colleague, this particular blog was the only consistent liblog around (not in those words)–in essence, that most liblogs (blogs created by library-related people but not as official library blogs) aren’t showing much action these days.

I didn’t take offense at the remark; this blog hasn’t been consistent since it began, and has never had a steady stream of meaningful posts. On a more general basis–well, I think the finding’s a little uncharitable, but I’m not sure it’s entirely wrong.
Fact is, a fair number of experienced libloggers have cut way back–disappearing for weeks or months at a time when they previously posted something every day or three. How many and to what extent? That’s a tough one; see below under “Documenting whatever it is.” I don’t think it’s entirely anecdotal, but that’s an anecdotal judgment.

Indirect context

Setting aside liblogs, what about library blogs–blogs created by libraries?

There seems little doubt that quite a few library blogs started without much planning and were deserted early on. Whether or not anyone actually said “Every library should have a blog” (Steven Cohen is on record as saying he used to ”think” that, but the direct statement, in so many words, doesn’t show up as a first-person statement, although James LaMee comes very close), that was certainly the implication of some early Library 2.0 statements. (Oh, look: Cliff Landis apparently did say every public library should have a blog, unless he’s misquoted here.)
But saying that lots of library blogs have been abandoned is, at best, anecdotal. (If you use the informal measure “One, two, three, lots” then all you need do is find four abandoned library blogs: Easily done. If by “lots” you mean “some substantial percentage of all that were created,” that requires something more than anecdotal evidence.)

Slightly better than anecdotal

In preparing the two soon-to-go-out-of-print books on library blogs, I looked at how 231 academic library blogs and 252 public library blogs were doing in March-May 2007. While certainly the largest sets of library blogs ever studied in any detail, neither set was remotely comprehensive, and that was deliberate:

  • All blogs had to be in English.
  • Blogs had to have started no later than December 2006.
  • Blogs had to be “reasonably active” in early 2007–that is, at least one post in two out of three of the study months (March, April, May 2007).
  • Blogs had to be reachable in mid-2007 (June-August), when I was doing the studies.

For public library blogs, I began with a set of 360 English-language candidates, of which at least 68 were defunct, 19 more weren’t reachable, 16 were too new–and at least 29 didn’t meet the “reasonably active” mark.
For academic library blogs, I began with a set of more than 400 blogs–of which at least 54 were defunct, 23 more were unreachable (or not blogs at all), 23 turned out not to be in English, and 22 weren’t “reasonably active.”
So, realistically, my studies excluded around 30% of “visible” public library blogs and 42% of “visible” academic library blogs, keeping the more robust specimens. (“Visible”: There were probably scores and possibly hundreds of other library blogs I didn’t know about, because they weren’t in one of the wikis tracking these things or otherwise evident.)
Similarly, while The Liblog Landscape 2007-2008 includes 607 liblogs, it excludes more than a hundred other English-language liblogs.
I did recheck the blogs that were in the study in a quick visit done while preparing for the 2009 OLA SuperConference. The results of that recheck appear in the February 2009 Cites & Insights, or separately as “Shiny Toys or Useful Tools?”–PDF rather than HTML because it was too much hassle to try to get the eight graphs into an HTML version.
Briefly, here’s what I found in terms of survival–from May 2007 to December 2008 for library blogs, from May 2008 to December 2008 for liblogs:

  • Of the 231 academic library blogs, 17 (7%) had disappeared entirely–but of the remaining 93%, more than three-quarters had at least one post within a month before the date checked and fully 92% had at least one post within 120 days.
  • Of the 252 public library blogs, 15 had either disappeared, gone behind a security wall, or changed into parking pages or non-blogs. Of the remainder, two-thirds had at least one post within a month before the test date and 89% had at least one post within 120 days.
  • Of liblogs–where some were already moribund by May 2008, and where I’d cut the list down to 570 by excluding a handful of non-English blogs and ones that had already disappeared (entirely, including archives) by March 2008–some 70% had posts within a month of the test date (47% within a week), and 87% had at least one post within 120 days.

So, at a gross measure, it’s fair to say that library blogs and liblogs that were established in early 2007 or early 2008 respectively had not, by and large, been abandoned by late 2008.
But those are gross measures and don’t say much about the reality of the blogs.
Additionally, there’s a bunch of anecdotal evidence that blogs may be suffering more over just the past few months, thanks in part to competition from Twitter, Friendfeed, Facebook et al.

Documenting whatever it is

In December 2008, I posted a note citing a kind comment from Kathleen de la Peña McCook, who said “I think maybe all this (blogs) may fade and that your books may document the movement.”

At the time–way, way back in the dusty distance of six months ago–I said:

I think blogs are already fading in one sense, but I also think it’s unlikely that they’ll fade away any time soon–no more likely than, say, the end of mail lists or email itself. I believe they’ve become reasonably well established as one medium; the uses for that medium are changing as other media emerge–so, for example, Twitter and delicious are probably reducing the number of pure link posts.

Now? I have no idea…and I’m intrigued on a couple of levels. Maybe (maybe) intrigued enough to waste a little more time on the subject. (Yes, I’d still love sponsorship for research–and more book buyers. No, I’m not holding my breath. I’d rather have a sponsor for Cites & Insights itself…)
That time wastage might come in two varieties:

  • A quick-and-dirty, but non-anecdotal, sweep of the 483 library blogs studied in 2007. For this sweep, I’d probably just look at, say, May 2009, and measure things that take almost no time to measure–most recent post (in days), number of posts in that month, maybe number of comments in that month (if any). I might also reclassify each blog as to whether it’s a “functional blog” (where the blog is really a publishing mechanism that’s embedded on the library’s website and has little or no separate presence–e.g., new events, new books lists).
  • A more substantial repeat visit to the trimmed list of 570 liblogs from the 2007-2008 study–anywhere from the q&d approach mentioned above to a full-scale study, or possibly something in the middle that pays more attention to how individual blogs have changed and, specifically, how “established” or “name” blogs have or haven’t done.

Are liblogs, or particularly the old reliable big-name liblogs, fading away? I honestly don’t have a good sense one way or the other.
Are library blogs surviving or prospering? Ditto.
Does anybody care? Damned if I know.

Preliminary ALA schedule

Sunday, June 28th, 2009

I’m putting this on Walt, Even Randomer because I assume most library subscribers are likely to have this—and it’s wholly irrelevant to the non-library folks who read ScienceBlogs blogs.

If you want to get together, suggest a meal, invite me to a reception, whatever, it’s possible: Send me email (waltcrawford at gmail dot com).

Boldface items are mandatory or highly probable, underlined probable (for work reasons), any others entirely optional.

Friday, July 10

American 828, SJC 7:25 a.m.-O’Hare 1:35 p.m.

Hotel: Chicago Hilton, 312-922-4400

No specific plans at this point.

Saturday, July 11

  • 8-10: LITA IG and Committee chairs, Palmer House, Grand BR
  • 10:30-12: Targeted marketing (PLA): MCP W-190b
  • 1:30-3: LITA Publications Committee, Palmer House, Indiana Room
  • 3:30-5:30: Leadership development in transition (ALCTS): MCP W-196a

Sunday, July 12

  • Listening to the customer: MCP W-179
  • 10:30-noon: Our town, common ground: Academic libraries’ collaboration with public libraries – Hilton Williford
  • 3-4: LITA awards reception, Intercontinental, Empire
  • 3:30-5:30: The future is now: Planning & staffing for change (PLA) – MCP W-180
  • 5:30-8: OCLC bloggers salon, Hilton, Boulevard Room C (2nd floor)

Monday, July 13

  • 8-10: Finding the leader within you (AASL) – MCP W-175b/c
  • 1:30-3: Leading the way: PLA fellows… – MCP W-175b/c

Tuesday, July 14

American 309, O’Hare 10:05 a.m.-San Jose 12:35 p.m.

Culture clashes II: PDF, XML and what’s in it for me?

Wednesday, June 24th, 2009

When I wrote this post, I left out a whole second “trigger” because of time and energy.
That trigger–once again, wondering whether my humanities background (rhetoric major, math minor) leaves me simply unable to cope with the true Scientific Mind–regarded the format used for publication.
Or, to put it another way, the widespread and vehemently-expressed view that PDF sucks (to use a polite version).
What I saw, in several conversations, was a seeming demand from text-miners that everything must be in HTML (or, better, XML) so it was easy to mine, with a complete disdain for layout and typography as irrelevant. (I can only imagine Donald Knuth’s response to the concept that typography and layout don’t matter…)

Why some of us humanists use PDF

Because we care about typography. Because we care about the presentation of what we’ve written. Because PDF–and, of portable formats, only PDF–can assure us that the typefaces and layouts we’ve chosen will be rendered properly for the reader.
And because it’s easy–pretty much automatic on the Mac, and not difficult on the PC (there’s a free Office download to define a PDF printer; I use Acrobat because it produces much smaller PDF files and because it can combine many PDFs into a single file, but for 95% of users, the free download’s good enough).

Getting from there to HTML

So you want HTML? Make it easy. Actually, for Word2007, it isn’t bad: Save as Web page (filtered), and you get not-too-ugly HTML. (Since .docx is actually an XML package, it probably should be better than it is.) But you have to tune an HTML-version stylesheet if you really want to do both well–one that only uses “easy” typefaces, for example. It won’t be elegant HTML, but it will work.
But, even here, what’s in it for me? Can you demonstrate that I’ll get more money, more fame, or even significantly more readers by taking those small steps?
“It makes it easier for me to plunder your text for my own purposes” is not, I hate to say, a terribly convincing reason. It might be for you, but it isn’t for me.
Still…after years of doing only PDF for my own peculiar ejournal, I started doing Word’s filtered HTML for most essays, because it did seem to serve some subset of readers–and it didn’t add substantially to the production task. But whenever I read one of the HTML versions, I wince a little: It’s just not as good as the PDF.

Going beyond HTML

But, you know, I think you want more than HTML. I think you want semantics–XML or better.
Provision of good-quality HTML from a regular writing-and-layout stream is at least plausible, with no real extra effort on the part of the writers and editors.
Provision of semantics, though–that’s a huge additional effort, and I don’t believe it’s one that’s readily automatable for non-trivial instances.
Which magnifies the question: What’s in it for me?
I’m honestly interested in the answers. “Some neato research down the line that will earn someone else grants and tenure” may not be a wonderful answer. Just sayin’


Update, June 25, 2009:
Based on one comment (not here–ah, the multifarious conversational channels!) I should stress that, when I say “What’s in it for me?” I’m not suggesting that there are no reasons to use HTML. Of course there are. (Hmm. I’m writing this in HTML, because it suits blogging–and, unlike WordPress’ editor, this editor is pretty much raw HTML, other than automatic paragraph breaks.)
I’m suggesting that there are also legitimate reasons to use PDF.
Really, “what’s in it for me?” (a phrase I rarely use) has more to do with demands for HTML–not for readability, but for text-mining–and pressures to do more than HTML. And the constant “PDF sucks!” refrain.
As noted above, I do provide HTML versions of (most) Cites & Insights essays (except for a small number that just don’t work well that way and one “print bonus” feature that appears sometimes)–because some people asked me nicely to do so as an alternative for those who really want to read online, and because it had been a while since people were demanding that my free publication should be revamped to suit their own preferences.
(Yes, I do mean demanding, in at least one case with fairly strong language. My standard response, after the unmailed two-word/seven-letter one, was that there are lots of other things to read on the web…)

Quick notes on research and information science

Sunday, June 21st, 2009

Angel Rivera was kind enough, in commenting on my previous post, to say “Yes, what you do is information science.”
I wonder sometimes–both about the field called “information science” and about whether what I do fits within it.
A snarky way to put this might be:

Can you do information science if you’re not part of academia?

Or,

Can it be information science if it doesn’t appear in the form of proper scholarly articles in proper refereed journals?

Not that I haven’t had articles in refereed journals. I have–not many, but a few.
But most of what I’d call research, particularly in the past few years, hasn’t appeared there. (Actually, my major research projects in previous decades didn’t result in scholarly articles either. That’s another story.)

Research?

What I can say about the research behind the two library blog books and the liblog book:

  • I’m transparent about methodology.
  • I’m scrupulous about following the stated methodology.
  • I don’t discard “outliers” or otherwise manipulate the evidence to suit any hypotheses.
  • I use statistics conservatively and, I believe, appropriately–particularly in The Liblog Landscape 2007-2008, which includes a lot more statistical analysis than the others.
  • When I state hypotheses, I spell out the extent to which the evidence does not support the hypotheses.

OK, so some statisticians would say I barely use statistics at all in the last-mentioned book, but that’s another discussion.

Outsider research not properly reported?

On the other hand…

  • My reports on the research don’t include literature surveys, extensive notes on previous related research (such as it is), the rest of the scholarly apparatus.
  • My reports appeared as books (later articles in Cites & Insights) rather than as articles.
  • Nobody’s vetted the research or replicated the work.
  • Most importantly: The work hasn’t been cited by any information scientists, as far as I can tell.

If research falls into publication and none of the scholars in the field cite it, does it exist?
I don’t have answers. I don’t fancy myself a scholar. I do dignify the work I’ve done as research, and believe it’s a lot more carefully (or at least exhaustively) done than some of the stuff I’ve seen Properly Published. (And I know from this and other projects that I could gather “statistically reasonable” samples that would prove almost any set of hypotheses I cared to offer.)
Comments?

Culture clashes and conference etiquette

Friday, June 19th, 2009

Here I am on ScienceBlogs, thanks to the loose definition of “science” that lets in “information science” and the even looser definition of “information science” that includes whatever it is I do.
And yesterday I found myself wondering whether I had any business being here–although the thought was more along the lines of “Holy cr*p! What’s going on here?” The situation had nothing to do with this blog–and a lot, I think, to do with culture clashes along the lines of that half-century-old notion of the Two Cultures.

The trigger

The trigger was a cluster of conversations taking place on FriendFeed and in blogs, some of them on this platform. It had to do with the propriety of liveblogging talks during a conference, talks not explicitly labeled as secret or closed. And after reading some of the conversations, I realized that, for all my decades as a systems analyst/programmer, I’m on the “humanities side” of this particular gulf.

The odd thing is that I’m not a big fan of liveblogging as a technique, for a couple of reasons:

  • As explored at length in “Speaking and attention: It all depends,” as a speaker, I used to have trouble with the idea of inattention–that, between backchannels, liveblogging, twittering, etc., the people in the audience weren’t really there fully.
  • Also as a speaker, I felt–and feel–that liveblogging and twittering tend to force speeches into a bullet-point mode: If a speaker wishes to build to a point using narrative means (“tell a story”), these bits-and-pieces techniques will work against effectiveness.
  • As a writer who frequently comments on what others have said, I encountered the dark side of liveblogging and conference reporting in general: Namely, what happens if you disagree with anything that’s reported. (If you’re high-fiving and saying “Wow, so-and-so made a great point,” all is well.) To wit, and particularly if the speaker is in one of the charmed circles, you get hit with some combination of “They never said that,” “You’re taking it out of context” and “That wasn’t what they meant at all.” (“Hit with” is the appropriate phrase.) After a couple of incidents, I came to a decision: I’d treat all conference reports, but specifically liveblogs and twitter streams, as fictional–I might note them, but would never, ever comment on them or believe they necessarily had anything to do with what was actually said (or meant).

But that’s a far cry from saying that liveblogging is either inappropriate or borderline unethical. I might say “I wish you’d listen for five minutes before you start tapping away–and by the way, feel free to leave if I’m not getting through to you,” but I would never say people were wrong to liveblog (or engage in backchannel chatter, which may or may not have anything to do with the actual speech).

The gulf?

The more I followed this particular controversy, the more I realized that “conference” in my context meant something very different than “conference” in the science context, at least as these scientists were using it.
Maybe–maybe–conferences-as-in-science, or at least some of them, can reasonably assume that, although anyone who registers can listen to a speech and, presumably, take notes on it and circulate those notes to friends & colleagues, that doesn’t make the contents of the speech public–that it’s reasonable to tell not only professional journalists but everyone that they shouldn’t reveal what was going on while it’s going on. (Maybe all such conferences should be held in Las Vegas, given the town’s advertising motto.)
But conferences-as-in-librarianship, at least all the ones I’ve ever attended, have had no such assumptions. On the other hand, very few speeches at those conferences involve stunning new discoveries backed by methodologically-sound research and even fewer involve any danger of being “scooped” or losing huge research grants because early information gets out too soon. As for the latter, so far I’ve encountered…well, none. People speak because they want to inform, to share ideas and winning strategies, to advocate, or because they’re On the Circuit and were invited to give Speech X to a new audience. (There are other motives, I’m sure, but sharing and informing are certainly the dominant ones.) People want what they say to reach a wider audience. Some speakers must love liveblogging, particularly those whose speeches lend themselves to the process.
Can we communicate across this gulf? Is it a real gulf, or is it edge cases? People like John D. and Christina P. convince me that the answer to the first question is yes, at least for some of us. The second one? Who knows?

Inconclusion

I don’t have a conclusion. There are culture clashes of sorts even within librarianship, to be sure, but most of the time I also see a shared culture, at least among the types of librarians most likely to be involved in the American Library Association. On the other hand, I just wrote (and then deleted) a whole set of internal “culture clashes,” many of them from (some) librarians within one specialty who (always wrongly) either treat other types of libraries/librarians as inferior or assume that all libraries are like their own specialty. And I’m fairly certain that there are many culture clashes within science, even if you leave out the social sciences.
I’ll keep trying to communicate.
Oh, and before you ask, I do at least vaguely understand entropy and the second law of thermodynamics–but thinking about or remembering that law is no more relevant to my everyday life or writing than any Shakespeare play is relevant to the everyday life of a nuclear physicist. On the other hand, when someone proposes a system that operates with 100% efficiency, a vague awareness of the second law does trigger my BS-meter…


A footnote and digression: If you want to get one of us wifty humanities types to pick up on the second law, for Gaia’s sake stay away from the Wikipedia entry! This site, though, ain’t bad: “If the first law of thermodynamics says you can’t win, then the second law of thermodynamics says you can’t even break even.” Followed by much more detail, to be sure.

50 Movie Comedy Classics Disc 7

Thursday, June 18th, 2009

Made for Each Other, 1939, b&w. John Cromwell (dir.), James Stewart, Carole Lombard, Charles Coburn, Lucile Watson, Eddie Quillan. 1:32.

At times, this movie seems like a comedy in the classical sense—a play in which some people survive until the end. There’s more drama than light-hearted humor, although there are a few funny scenes. James Stewart’s a young New York lawyer (who apparently makes almost no money) who goes to Boston to take a deposition and, while he’s there, meets and weds a beautiful young woman (Carole Lombard). His mother lives with them and treats her badly; his boss (and a nefarious associate) prevents him from going on a honeymoon cruise; he has no money but almost always has at least one servant (and there’s that cruise thing). Then there’s a baby; they desperately need more money and he should be named a partner, but instead he meekly accepts a 15% pay cut…and soon, it’s New Year’s Eve and the baby contracts a rare pneumonia. Along the way, one standing joke is that the head of the lawfirm (Charles Coburn, who does a fine job) can only hear you if he opens his jacket and you yell into his pie-plate-size hearing aid microphone.

Laughing yet? It gets funnier. The only way to save the baby is with a new serum—but there’s none in New York, Johns Hopkins sent all of theirs (apparently the only supply anywhere) to Salt Lake City; the latter can spare a little, but there’s a terrible storm—and a pilot wants $5,000 to fly it back. We get several minutes of a (different) pilot in an open-air plane flying through storms and even bouncing off a mountainside at one point, then the plane catching fire and the pilot parachuting with serum package in hand. Of course, everything works out—the baby’s saved, the father gets his partnership, the mother comes around, and all of the happy ending is in the last two minutes.

The print’s pretty good, the sound’s fine, the acting is also fine. Not exactly a laughathon, but well made and enjoyable. $1.25.

That Uncertain Feeling, 1941, b&w. Ernst Lubitsch (dir.), Merle Oberon, Melvyn Douglas, Burgess Meredith, Alan Mowbray, Eve Arden. 1:24

Jill Baker (Merle Oberon) keeps getting the hiccups and is persuaded to see a psychoanalyst (Alan Mobray). She becomes disillusioned about her husband (Melvyn Douglas) and meets a strange but interesting pianist (Burgess Meredith), who she becomes involved with.

The husband plans to use psychology to get her back. After all sorts of incidents, it works—but it’s a very lightweight movie. Still, Burgess Meredith does a fine job, as do Oberon and Douglas—and the young Eve Arden (with her instantly-recognizable voice) has a small but significant role. Here’s the problem: For one reason or another, I didn’t review this right after seeing it—and after four days, I’d almost completely forgotten the plot and the performances. “Lightweight” may overstate it. Still, and despite some soundtrack damage, I’ll give it $1.25.

The Great Rupert (aka A Christmas Wish), 1950, b&w. Irving Pichel (dir.), Jimmy Durante, Terry Moore, Tom Drake, Frank Orth, Sara Haden, Queenie Smith, Chick Chandler. 1:28 [1:25].

A movie about vaudeville, the virtues of local investing, passing along good fortune—and a dancing squirrel. The squirrel’s trainer has to depart a basement apartment for lack of funds, sets the squirrel (The Great Rupert) free to roam, and runs into another vaudevillian family, the Amendolas, father played by Jimmy Durante, who’s fled their last residence for similar reasons and wangles their way into the apartment without paying in advance. Meanwhile, the landlord finds out that a worthless gold mine he’d been conned into investing in is paying off, to the tune of $1,500 a week for his share. He won’t deal with banks and doesn’t trust his wife or musician son, so he stuffs the bills into a hole in the wall near the floor.

But the space behind the hole is now occupied by The Great Rupert, who finds these bills distracting, so he sweeps them away—right into the hole in the roof of the basement apartment, where they come fluttering down just after Mrs. Amendola prays for a little money. And the next week—after they’ve spent the money, between paying off debts, buying shoes for their beautiful daughter, and lending the rest to people in similar circumstances—she prays again, and another $1,500 comes fluttering down.

So there’s one plot. Others involve Amendola’s daughter (who’s a harpist), the son upstairs (who likes her—and it’s mutual—and plays tuba: he composes a piece for “two forgotten instruments” to play with her), a show-biz type who also likes her (and keeps taking her out for meals, but gets nowhere), the son getting conned into a worthless oil investment, and eventually simultaneous visits from the local police, IRS and FBI, all wanting to know where the family’s getting all the money. Meanwhile, as the landlord notices, “and Amendola” keeps showing up on various small businesses (because Mr. Amendola keeps lending or investing in them), all of which seem to be doing very well.

There’s more—but I shouldn’t give it all away. The ending is, well, as it should be but also more than a little peculiar. All in all, a fun movie, but the print’s severely damaged, with missing chunks of dialogue and visual damage. Given the damage, I can’t give this one more than $1.00.

Something to Sing About, 1937, b&w. Victor Schertzinger (dir.), James Cagney, Evelyn Daw, William Frawley, Mona Barrie, Gene Lockhart, Philip Ahn, Kathleen Lockhart. 1:33 [1:27].

Ladies and gentlemen, we have a winner. It’s easy to think of James Cagney as a tough guy, but he was also a first-rate hoofer and pretty good singer, and those talents shine in this romantic comedy. He’s Terry Rooney (or, rather, that’s the character’s bandleader name—his real name’s Thaddeus McGillicuddy), and bandleader/singer who’s been invited to Hollywood for a movie. He leaves, getting engaged to his singer/girlfriend just before getting on the train.

In Hollywood, the studio head makes sure that Rooney never realizes the extent of his screen chemistry and talent, trying to keep him from wanting a good contract. Rooney assumes he’s a disaster (and gets in a fistfight on set, which turns out to be staged to get a better film sequence) and has his fiancé fly out to Hollywood, where they get married and, with the picture wrapped, take off on a tramp steamer to the South Pacific. (This seems to be an era in which the train is the preferred way to go coast-to-coast, but you can fly if you’re in a hurry.)

Well, sir. The movie’s a big hit, Rooney’s a Big Star. When he returns, the studio exec wants to sign him up for seven movies (years?), but the contract says he has to be single. They come up with a gimmick: His wife will use her real married name (Mrs. McGillicuddy), live next door, and act as his personal assistant. Which is fine, but a female star makes a play for him, which an agent pushes on the press as a hot new romance—and his wife gets tired of it all.

That’s more of the plot than you really need. Let’s just say it all ends up as a romantic comedy should, with a few great song-and-dance numbers along the way (including on the tramp steamer, where they’re the only passengers and most of the show is crew entertaining one another, flawed a bit by the clearly visible accordion, guitar and harmonica sounding a lot like a string-and-brass ensemble). The print’s pretty good with a little damage. (One oddity is revealed in the IMDB trivia area. I noted that the studio was Grand National, which I knew only for B westerns—and it turns out this movie broke the studio financially.) I’ll give it $1.50—not great, but a winner.

Counting cycles

Sunday, June 14th, 2009

I picked up a little buzz about Google software engineers planning to rework the guts of some major open-source software to make it run faster. Since it wasn’t software I use, I didn’t read enough to remember what software, but it brought up memories…

Walking to school in the snow, 3 miles, uphill, both ways

No, this isn’t going to be one of those posts. I only wish we’d had the kind of raw processing power in my early years (decades?) as a systems analyst/programmer that we take for granted now. Most people today spend more time on what needs to be done, and that’s as it should be.

This is just a little harmless nostalgia, none of it longing for those days.
(If you want my take as of three years ago as to how I think I’d deal with being young again, here’s your post.)

Early on, cycles really didn’t count

As I’ve noted elsewhere, my first systems analysis and programming involved an IBM 188 Collator. (Hmm. 20% of all Google results for the search [IBM ‘188 collator’] are my handiwork. That may be depressing.] In some ways, the 188 was a marvelous machine, particularly in 1961 when it was introduced: IBM’s first punch card equipment using solid-state circuitry and core.

That’s right, core memory–visible devices, just a wee bit larger than today’s RAM bits. I honestly don’t remember how much core the 188 had–maybe 64 bytes, but that’s vague memory. I do remember how you programmed it: with a double-wide board full of holes, into some of which you put jumpers to make circuit connections. Hard-wired programming…
For the circulation system, it wasn’t a question of using too many computing cycles. You got 650 cycles per minute–that is, one cycle for each card feed. Your program did whatever logical comparisons between two cards (one from each reader) as it could, given the limited core and your ingenuity, then either fed both cards into a common bin or one or both cards into other bins.
Sounds primitive. Was primitive. Worked.
(More technologically interesting, in some ways, was IBM’s last card sorter–by far the fastest, and using vacuum feed rather than pushers to move the cards and an optical sensor rather than brush contact, so that a card would last for thousands of sorts without wearing out. Without the speed and gentleness of the IBM 84 [2000 cards per minute, which is fast for a mechanical device processing little pieces of stiff paper], the circ system would never have kept up with Doe Library’s volume of business.)

A bit later, every cycle counted

Comparing computing power of, say, the IBM 360/65 that I did early programming on (indirectly, sending decks of cards over from Berkeley to UCSF) and the Intel Core 2 Duo notebook I’m writing this on is a chump’s game. Looking at some sources, I see “1.25 million calculations a second” for the ’65, which had one megabyte of RAM (rather a lot in those days). How does that compare with two CPUs, each with 1.6 billion processing cycles per second, and 4 gigabytes of RAM? You got me; I’m not sure there is a real answer to that question.

The thing is, doing library processing on a machine with that kind of power required a lot of optimization. The ideal language for the work I wanted to do was clearly PL/I, for its combination of logic and string processing–but the head of the systems office properly wouldn’t let me use PL/I because the early compilers just didn’t produce tight code. Instead, I used assembler (BAL)…
When PL/I (Optimizer) came along (and we’d moved up to a somewhat faster S/360), I could start using the high-level language–but not without paying attention. I remember a classic example: Cases where I needed to do translates to normalize characters for sorting purposes. The classy way to do that would be to include two strings of characters in the TRANSLATE statement, the source and the object. But, after trying that and seeing the results, I moved to using two 256-character strings (not variables), containing the source and object sets.
Why? Because it made a difference of at least 10:1 in the overall running time of the program–changing it from something we couldn’t use to something we could. And once you understood some assembler and learned to read PL/I’s pseudo-assembler output, you could see why:
If you were translating using variables, then the compiler would generate code that built two 256-character strings each time the translate was performed, then do the translate–a big, unwieldy loop of code.
If you were translating using fixed strings, then the compiler generated one assembler statement. One. I think the difference for the translation steps was at least two orders of magnitude, maybe even worse.
That’s just one example. There were many others. In the ’70s and early ’80s, I’d probably spend as much time optimizing code as writing it in the first place, maybe more–and after the first two programs, my first code was already fairly optimal.

Don’t take me back…

With more abstract tools and less need to worry about cycles, I could have (potentially, at least) accomplished a lot more. So could we all. I think it’s great that a modern PC (Mac, Unix or Vista) can devote perhaps 90% of its cycles to system overhead–and still have plenty left for actual computation.

Still, sometimes things really do run slower than you’d like–and there are still lots of programmers who understand code efficiency. (I’d bet Google has hundreds of them!) They may be counting cycles at a more abstract level, but they’re still coming a little closer to the machine side of the man:machine boundary to get the job done.

Moved: A reminder

Saturday, June 13th, 2009

Just a quick reminder that Walt at Random has moved to a new address.

Please update your feeds, blogrolls, whatever.

I wonder whether some master spam agency checks for site moves: I’m suddenly getting a LOT more spam. But, of course, this site still remains as Walt, Even Randomer, and the spam isn’t getting thorugh…

My Back Pages: The C&I Version of Friday Fun

Saturday, June 13th, 2009

I’m not snarky by nature. Really I’m not.
Or, well, I’m recognizing that pure snark rarely improves a situation, and trying to reduce the amount of it within the e-zine. I got rid of one running section that was always negative by nature, just because it was always negative by nature.
But sometimes, just for a little while, I like to have a little fun with what I see as excesses. That section is My Back Pages, and like many other “last page” features it’s not intended to be taken too seriously. That section is also a “PDF bonus”–I don’t make it available as a separate HTML section.
The July 2009 Cites & Insights ends with six little items in My Back Pages–and, uniquely, I’ve already received feedback (and pushback) on one of them, a commenter who takes me to task for questioning some of the statements in PC World‘s “FuturTech” article last December. That reply might make it into a future issue (I used to have Feedback and Followup sections fairly often, but there hasn’t been as much feedback lately and followups usually emerge as separate essays), probably with a little pushback of my own.
I don’t think there’s much point in summarizing the six little pieces. They are, after all, there mostly for fun.
And that’s it for the current Cites & Insights.