66,504 Facts & Formulas in 1.98MB

Very quick update on what’s now The Liblog Landscape 2007-2010 (I’m reserving The Way We Blog for a possible five-year overview, maybe, perhaps, if I don’t cure this obsession).

  • I think, believe, hope I have the derivative pages populated on the master spreadsheet–that is, all of the calculated data as well as the observed data.
  • After problems I had last year, I’ve had the good sense (?) to: a. Not try to put everything on one massive page, b. Save a copy of the master spreadsheet, and…c…perhaps most important: Save another copy where every page is values & number formats, without any formulas or references. (The master sheet is lousy with both of them.) This should minimize, maybe even negate, problems with screwing up data while sorting, summarizing, etc…particularly since I won’t actually use the “fixed copy” (the copy with no formulas), I’ll use a working copy of it, feeling free to not only hide but delete columns for convenience…’cuz I can always restore the whole thing.
  • Damn, but “If” formulas with four levels of testing are clumsy to get right…but that’s what I needed, and I got it, eventually. (That is: There are in all four “If” statements within the nested overall statement. Trust me, it’s necessary…partly because I need to distinguish between “0 because none there” and “0 because unable to count” or “0 because the blog didn’t exist yet.”)
  • Excel does reasonably well on compactness. The master spreadsheet includes six pages, each with 1,305 rows (labels and 1,304 blogs), with–respectively–24, 9, 12, 20, 20, and 28 columns. All cells are populated (frequently with “dummy numbers,” which are always negative). There’s a lot of duplication among the columns, in order to make analysis a little less screwy (that is, each major segment of analysis has its own page), but there are, in fact, 51 distinct columns among the–lessee–113 columns. Of those, 24 are observed items, 27 are calculated items.
  • So, depending on how you look at it–and ignoring column headers–there are either 66,504 data items (including blog names and URLs, both of which can be long) or 147,352 data items in the master worksheet.
  • All of that stores in Excel2007, including format information, as a 1.98MB spreadsheet with formulas–and a mere 1.05MB spreadsheet without formulas. That strikes me as pretty efficient storage.

Now, to start messing with the working copy of the “fixed” spreadsheet…and writing it up (well, I’ve already written up most of the metric definitions). In case anybody cares, I’m currently tending toward a “hybrid solution”–publishing most chapters (excluding the first, which will have all the hot overall items and will be developed as I write the book) in C&I, also appearing as 6×9 PDF separates, and–when it’s ready–making the whole book, with index and first chapter, available through Lulu for the few who want it.

Oh, and finishing up/publishing C&I for November 2010–which will not include any of this project.

Comments are closed.