Distilling data vs. harvesting data

I’ve been on vacation for the last week out on the Oregon coast in a little town with no stop lights, one restaurant and a lot of sand. On the drive back home today I saw a number of tractors harvesting hay (actually baling hay, but work with me) which, as far as I can tell, is a process where a machine goes back and forth through the entire field of already cut hay and binds up big chunks of the hay into either round or rectangular shapes wrapped in twine (Wikipedia has a really in-depth article about the whole process if you’re curious). Anyway, at work I’ve been referring to the process of winnowing down the large amounts of data that we all cope with as ‘harvesting data’, but then tonight I read this great article about Jon Stewart in the NY Times and I’m convinced that it would be better to say ‘distilling data’. Here’s the quote (which is actually attributed Stephen Colbert) that made me think that:

“You have an enormous amount of material, and you have to distill it to a syrup by the end of the day. So much of it is a hewing process, chipping away at things that aren’t the point or aren’t the story or aren’t the intention. Really it’s that last couple of drops you’re distilling that makes all the difference. It isn’t that hard to get a ton of corn into a gallon of sour mash, but to get that gallon of sour mash down to that one shot of pure whiskey takes patience” as well as “discipline and focus.”

In the article, Stephen Colbert is referring to the process that The Daily Show team goes through, well, daily, to get to a point where everyone can get on the air and talk about truthiness. There’s a million things going on in the world, but they only have 30 minutes a night. Same thing with us: there’s more than a trillion pages on the internet (literally) but we’ve only got 24 hours in a day. We need better distilleries, not better harvesters.

Maybe it seems like I’m splitting hairs, but I think having using the correct analogy is a helpful framing device: harvesting is the process where you collect all the data. I think we’re all doing pretty well with our aggregators and personalized portals and social networks that grab RSS feeds and status updates and pictures all into one place. But after a week off, I think that harvesting isn’t my problem: it’s distillation, the process by which we separate the the useful from the useless, the required from the optional, the interesting from the boring.

Getting my email is going to suck.

2 thoughts on “Distilling data vs. harvesting data”

Leave a Reply

Your email address will not be published. Required fields are marked *

You may use these HTML tags and attributes: <a href="" title=""> <abbr title=""> <acronym title=""> <b> <blockquote cite=""> <cite> <code> <del datetime=""> <em> <i> <q cite=""> <strike> <strong>