Comparing potatoes and truffles

Remember Wired Magazine‘s absurd “The Web is dead” cover article (September 2010)?

I can’t think of anything that was right about the article. One of the things that was most wrong was the big graph that showed how the web was dying–by plotting all internet traffic, in bytes, on a market-share graph (that is, one where the Y axis is always filled, since it goes up to 100% and the segments show percentage of each area over time).

One thing that was wrong with it is that this kind of graph is almost always misleading or meaningless when an overall space is either growing or shrinking, since it represents percentages, not absolutes. If Amazon goes from selling 90% of ebooks when ebook sales are $1 million per year to selling 30% of ebooks when ebook sales are $1 billion per year, I can assure you nobody at Amazon is saying “Damn. We’ve died in the ebook space.” But that’s what a market-share graph would show: A dramatic, awful, terrible decline in Amazon ebook sales.

The other is even more absurd, and is where I get “potatoes and truffles.” Well, you know, they’re both edibles that come from the ground, so clearly truffles are dead, since the weight of potatoes sold each year must surpass the weight of truffles by several orders of magnitude. Actually, they’re both tubers, so what’s the difference? (“Several orders of magnitude”: I can’t readily find the current total production/sale of truffles, but it apparently peaked at “several hundred tonnes” early in the last century, so I’d guess it’s no more than, say, 314 tonnes now. Which is a deliberate choice because 2008 worldwide production of potatoes was 314 million tones. So figure at least a million times as many potatoes, by weight. And there’s even the time element, since truffle production has dropped enormously while potato production continues to rise.)

The other fallacy? Choosing one measurement and assuming that it’s meaningful in other contexts. In this case, choosing data volume (bits or bytes) and assuming it relates somehow to “where people spend their time.”

I choose that quotation because here’s how Wired responded to the criticisms of their chartjunk in this case:

While not perfect, traffic volume is a decent proxy for where people spend their time.

Bullshit.

Last Saturday, we had a friend over and spent a wonderful two hours and 31 minutes watching the glorious Blu-ray version of The Music Man. I felt as though I’d never really seen the picture before. It was great. It was also 2.5 hours.

I’m guessing The Music Man probably took up around 40GB (a dual-layer Blu-ray Disk has 50GB capacity).

Today, I’ll start reading a mystery novel that I’m certain is going to be enormously entertaining as well. At 250 pages, the text in it would probably occupy about–well, let’s call it 80,000 bytes, although that’s probably on the high side.

By Wired‘s “reasoning,” it’s a fair approximation to say that I should spend around 0.018 seconds reading that book, since it has only one-five hundred thousandths as much data as The Music Man–and “traffic volume is a decent proxy for where people spend their time.”

In the real world, I’ll probably spend three or four hours reading the novel, maybe a little longer.

An extreme case?

OK, so a Blu-ray Disc is an extreme case. Internet traffic almost never includes 30mb/s streams, which is roughly BD level. But it does include loads of video, probably at traffic rates between 250kb/s and 6mb/s, and audio, at traffic rates of at least 64kb/s for anything with halfway decent sound (“halfway decent” is the operative term here).

So if I watch a one-minute YouTube clip, it’s likely that the traffic amounts to at least 1.9 megabytes (at the lowest datarate supported by YouTube) and more likely at least twice that much.

How much time would it take me to read 1.9 megabytes worth of text, even with HTML/XML overhead?  Without overhead, that’s about 300,000 words, or the equivalent of three long books. With PDF overhead (which, for embedded typefaces, is considerably more than HTML overhead), that’s four typical issues of Cites & Insights–but for the text itself (with Word .docx overhead), it’s at least a year of C&I. I pretty much guarantee that anybody who reads C&I at all spends more than a minute doing so, even though the data traffic only amounts to a few seconds worth of  YouTube.

Equating “traffic” for text, or even still photos, with “traffic” for sound or video, as being in any way meaningful in terms of time spent is just nonsense. Wired says “We stand by the chart.” That says a lot about Wired–and almost nothing about the present or future of the web.

2 Responses to “Comparing potatoes and truffles”

  1. Seth Finkelstein Says:

    I’m reminded of the arguments over the fate of USENET, and the similar aspect where volume of bytes was a factor in the argument – including many, many, encoded binaries.

  2. walt Says:

    Seth: Well said. The claim that raw traffic relates directly to significance or use is not a new one; I used to give a calculated analogy in some talks, comparing the bandwidth used by all viewing of a midrange TV show with all book publishing, as one example of why such discussions are absurd.


This blog is protected by dr Dave\\\\\\\'s Spam Karma 2: 103808 Spams eaten and counting...