Aaron Johnson Now with 50% less caffeine!

Posted
18 October 2003 @ 10pm

Tagged
Books, Software Development

Spidering Hacks

I fielded a couple questions this week about search engine safe URL’s both of them along of the lines of a) how do you create them? and b) are they even worth it? I’m written about how you can create them using Apache before, but one of the things I didn’t mention was that I think writing your own spider.. or at least attempting to, is a great first step to understanding why search engine safe URL’s are important. To that end, I’d suggest the “Spidering Hacks” book that Oreilly just released as a great starting point. The book uses Perl quite extensively, but it’s the process that matters. I’ve picked up “Programming Spiders, Bots, and Aggregators in Java” at Barnes and Noble quite a few times as well, but have never pulled the trigger.

If you’d rather read code, you can download the spider/indexing engine I’ve been working on (was working on!) to get some kind of idea of what goes into a spider.


No Comments Yet


There are no comments yet. You could be the first!

Leave a Comment

Martin Fowler @ NEJUG: Software Design in the Twenty-first Century Java UI icons