The Data Life Cycle of a Blog Post

Cool flash infographic in the latest issue of Wired that shows what happens to your blog post after you click the ‘publish’ button (I’ll save you the hassle of actually viewing it: after you click the ‘publish’ button, exciting things like ping servers, data miners, search engines, text scrapers, aggregators, social bookmarking sites, online media, spam blogs and finally readers get involved). Since it’s Wired and not XML Journal, they stopped at the infographic, but man, it should would be cool to see all the ways that data massaged, reformatted, sliced and diced and transmitted, because there’s a lot that happens in that process. Just for the fun of it, I’m gonna walk through the scenarios I know about.

First, you click the publish button. But that might be a publish button on a desktop blogging client like Windows Live Writer or it might be the publish button in Microsoft Word or it might be a real live HTML button that says ‘publish’. So before you even get to the publish part, we’ve got the possibility of the MetaWeblog API (which is XML-RPC, effectively XML over HTTP), Atom Publishing Protocol (again effectively XML over HTTP) or a plain HTTP (or HTTPS!) POST.

OK, so now your blog post has been published on your blog. What next? Probably unbeknownst to you, your blog post has been automatically submitted to one or more ping servers using XML-RPC (XML over HTTP). Because search engines got into the blogging business, you can even ping Google and Yahoo (curiously not Microsoft, why?). If you don’t want to hassle with a bunch of different sites, you can always use pingomatic.com, which will ping (as of 1/27/2008) twenty one different ping servers for you.

Oh, I forgot to mention. If you’re using TypePad, Livejournal or Vox, the information about your blog post isn’t sent to these ping servers using XML-RPC, it’s streamed as XML in real-time over HTTP to many of the same parties.

Great, your blog post has now been sent to everyone, you’re good right? Nope. Now comes the onslaught of spiders and bots, awoken by the ping you sent, who will request your feed (RSS / Atom over HTTP) and your blog post (HTML over HTTP) and your first born child again and again and again. And now that your blog post is published and assuming that you’ve published something of value, you’ll see real people stop by and comment on your blog post and maybe bookmark it in a site like del.icio.us or ma.gnolia.com, snipping a quote from your blog post and then publishing that snippet to their own blogs or to their bug tracker and now your blog post has replicated, it lives in small parts all over the web, each part getting published and spidered and syndicated and ripped again and again and again. It’s beautiful isn’t it?

Leave a Reply

Your email address will not be published. Required fields are marked *