My third book for 1986 isn’t precisely mine and grew out of a two-year project at RLG, one that resulted in other later publications as well.
It was, among other things, a much earlier set of statistics for MARC field occurrence in a large set of bibliographic data than anything comparable I can think of—but that was actually a lengthy appendix.
Background
In 1984, the J. Paul Getty Trust funded a two-year RLG project with a number of aims. One portion of the overall project was the Patron Access Project. The goal of that project was to develop a design for a workstation-based patron access system to work with an online catalog based on RLIN software. The project rested on several assumptions, among them that online catalogs (and especially patron access) were just beginning a long process of development, evaluation and improvement; that scholars and research libraries might have special needs less likely to be fully addressed by commercial catalog development; that by 1990 scholars would have access to powerful microcomputer-based workstations; that RLIN itself (while “an unusually sophisticated database engine and retrieval methodology”) was not designed for direct use by scholars or other patrons; and that RLG should focus on the access needs of scholars as part of its overall goal.
I served as investigator for Phase 1 of the project, studying—exhaustively—the literature of online catalogs and preparing an extremely detailed outline of issues for online catalogs. (Remember when we used special outline software to develop outlines—before it was plausible to just use the outline functions of Word and competitors?) In 1985, I attended a CLR conference on online catalog screen displays and “came away convinced that the library community could benefit from large-scale tests of bibliographic display systems.”
Since I was still Product Batch manager (a post I gave up at the end of the project, becoming “assistant director for Special Services”), I was aware that RLG maintained the RLIN Monthly Process File, a file in MARC format containing 700,000 to 900,000 records (anything created or updated or used for catalog cards or other products during the previous six weeks)—and that it was feasible to use that file as a testbed. (At the time, computer capacity and handling methods didn’t really allow for processing the entire RLIN database for this sort of thing.) I developed the Bibliographic Display Testbed program, making it possible to try out a proposed set of display rules and see the results—both sample screens and how often, for example, records would run over to second or third screens.
A sidebar about the times and technology. In 1985-1986, and a few years beyond that, most library computer displays, especially for online use, were character-based, showing 24 lines of 80 characters each (fixed-width characters). You typically got from one screen to the next by typing a command, certainly not by scrolling down an effectively-infinite-length virtual screen. (What would you scroll with? Those smart terminals didn’t have mice.)
So there were real reasons to be concerned with how often users would need to go past the first screen of a record display, especially given the sense that a fair number of users might not bother.
I’d worked with MARC records—and specifically RLG’s MARC records, which included a lot of archival and manuscript control records—to suspect that bibliographic data was too heterogeneous for small samples to be terribly meaningful. We ran some 100-record tests, which satisfied my conviction: They varied so much from test to test as to be nearly useless.
RLG concluded (at my suggestion) that we could provide a useful product for the wider library community by testing a range of possible display designs and publishing the results. That would require work time, more than one analyst—and a means of distributing the result. Some portion of the time of two other library systems analysts (Lennie Stovel and Kathleen Bales) was made available, and Knowledge Industry Publications, Inc. agreed to publish the results (with RLG owning the copyright and receiving what royalties might ensue, since this work was done on work time).
Foreground
This was a team effort. I wrote the programs and documentation, managed the large-scale test runs and wrote most of the text for the book. I also provided some possibilities for display design, based on the Patron Access Project study. Lennie Stovel provided much of the display design, investigating different possibilities for the top and bottom of the screen, different label alternatives and different sets of data elements. Kathleen Bales (some of you know her as Kathy) worked with Lennie to prepare the final sets of data elements and labels and to refine the designs. Both of them reviewed my program design and suggested improvements.
We were looking at several issues for online catalog design: which fields and subfields to include in each kind of display, how to arrange and group the fields, whether to use labeled or cardlike displays, what labels to use and where to put them, what techniques to use to improve legibility (remember, we’re talking about fixed-width characters with relatively low resolution), how many different display types to provide and what other information to put on the screen (and where!).
We saw five major questions: Does the display provide an appropriate amount of information? Will patrons understand the information as it is displayed? Is the display readable and attractive? Will patrons be able to find information rapidly and to find all the information needed? Will patrons be able to view the information on a single screen?
As far as we knew, almost no work had been done on the final question and not enough on the others.
We did hundreds of early test runs, mostly using a single day’s activity (19,000 to 25,000 records at the time), but several dozen using the entire six-week file. Based on those tests, we concluded that three levels of display were minimal—brief, medium and complete, each possibly either cardlike or labeled. The aim was for a brief display to leave at least seven lines for holdings information at least 90% of the timeand for a medium display to fit on one screen (with at least three lines of holdings) 90% of the time. It was clear that complete labeled displays would usually require at least two 24-line screens—but that complete cardlike displays could usually fit on one screen with minimal holdings.
We finally arrived at a common frame—the top and bottom of each screen—and a common set of data elements for medium displays. For various reasons, the dataset used for testing was reduced to a subset containing 395,000 to 405,000 records (or, for public libraries, a constant set of just under 35,000 records). We ran final tests against those records to determine percentages, and used a fixed set of eight representative records to prepare mockup displays.
The result was this 359-page 8.5″ x 11″ paperback. It includes eight chapters, most chapters combining discussions of specific display design possibilities, tables of the efficiency of those options and figures showing how the options worked out in practice. (There are a lot of figures—the book’s mostly tables and figures—with most chapters having anywhere from 46 to 99 half-page screen simulations and four or five tables each.)
Appendix A included field occurrence tables (showing for each USMARC field the occurrences per hundred records and the average field length) for all records except archival & manuscript control (a testbed of more than 628,000 records—that table is four pages long); field occurrences for 34,941 public library records; a comparison of two different 600,000-record samples (taken four months apart) for selected fields; and field occurrences for each bibliographic format (with sample sizes including 522,000 books records, 3,975 AMC records—which were and are distinctly different than most others, a mere 408 machine-readable data file records (there weren’t many of those back in the mid-1980s!), 1,000-odd maps, 11,600-odd musical scores, 50,000 serials, 4,450 sound recordings and 1,600-odd visual materials—and for each format, how the sample performed for each of 28 display possibilities. Another appendix provided a full MARC-tagged listing for each record used in most of the tests.
What was the impact of this book? I can’t say. I believe that the related Patron Access: Issues for Online Catalogs (more about that later) helped to convince designers to give “gutter-aligned” labeled displays a try—that is, displays where the label is right-aligned and the field text is left-aligned. Such displays were almost unknown before that book was published and became nearly standard (for labeled displays!) in later years: They sounded strange, but we found that they worked very well.
Are there huge differences between the field occurrence rates we found back then and those in the much larger grant-sponsored study (against a copy of most of the OCLC database) done more recently? Not really. The newer study took things down to the subfield level, but the general results were quite similar—as you’d expect. It’s not news that most bibliographic records only use a handful of fields; the question is whether the special cases that require oddball fields should be supported by the formats. I always believed they should, and continue to believe that, but—again—that’s another discussion.
Crawford, Walt, Lennie Stovel and Kathleen Bales. Bibliographic Displays in the Online Catalog. Professional Librarian Series. White Plains, NY: Knowledge Industry Publications, 1986. ISBN 0-86729-198-2 (pbk.)