{"id":524,"date":"2003-10-18T22:03:17","date_gmt":"2003-10-19T02:03:17","guid":{"rendered":"http:\/\/wordpress.cephas.net\/?p=524"},"modified":"2003-10-18T22:03:17","modified_gmt":"2003-10-19T02:03:17","slug":"spidering-hacks","status":"publish","type":"post","link":"https:\/\/cephas.net\/blog\/2003\/10\/18\/spidering-hacks\/","title":{"rendered":"Spidering Hacks"},"content":{"rendered":"<p>I fielded a couple questions this week about search engine safe URL&#8217;s both of them along of the lines of a) how do you create them? and b) are they even worth it?  I&#8217;m <a href=\"http:\/\/www.cephas.net\/blog\/2002\/11\/15\/search_engine_safe_urls.html\">written<\/a> about how you can create them using Apache before, but one of the things I didn&#8217;t mention was that I think writing your own spider.. or at least attempting to, is a great first step to understanding why search engine safe URL&#8217;s are important.  To that end, I&#8217;d suggest the &#8220;<a href=\"http:\/\/www.oreilly.com\/catalog\/spiderhks\/index.html\">Spidering Hacks<\/a>&#8221; book that <a href=\"http:\/\/www.oreilly.com\/catalog\/prdindex.html\">Oreilly<\/a> just released as a great starting point.  The book  uses Perl quite extensively, but it&#8217;s the process that matters.  I&#8217;ve picked up &#8220;<a href=\"http:\/\/www.amazon.com\/exec\/obidos\/ASIN\/0782140408\/cephasnet-20\">Programming Spiders, Bots, and Aggregators in Java<\/a>&#8221; at Barnes and Noble quite a few times as well, but have never pulled the trigger.<\/p>\n<p>If you&#8217;d rather read code, you can download the <a href=\"http:\/\/www.cephas.net\/projects\/\">spider\/indexing engine<\/a> I&#8217;ve been working on (was working on!) to get some kind of idea of what goes into a spider.<\/p>\n","protected":false},"excerpt":{"rendered":"<p>I fielded a couple questions this week about search engine safe URL&#8217;s both of them along of the lines of a) how do you create them? and b) are they even worth it? I&#8217;m written about how you can create them using Apache before, but one of the things I didn&#8217;t mention was that I &hellip; <a href=\"https:\/\/cephas.net\/blog\/2003\/10\/18\/spidering-hacks\/\" class=\"more-link\">Continue reading <span class=\"screen-reader-text\">Spidering Hacks<\/span> <span class=\"meta-nav\">&rarr;<\/span><\/a><\/p>\n","protected":false},"author":1,"featured_media":0,"comment_status":"open","ping_status":"closed","sticky":false,"template":"","format":"standard","meta":[],"categories":[9,2],"tags":[],"_links":{"self":[{"href":"https:\/\/cephas.net\/blog\/wp-json\/wp\/v2\/posts\/524"}],"collection":[{"href":"https:\/\/cephas.net\/blog\/wp-json\/wp\/v2\/posts"}],"about":[{"href":"https:\/\/cephas.net\/blog\/wp-json\/wp\/v2\/types\/post"}],"author":[{"embeddable":true,"href":"https:\/\/cephas.net\/blog\/wp-json\/wp\/v2\/users\/1"}],"replies":[{"embeddable":true,"href":"https:\/\/cephas.net\/blog\/wp-json\/wp\/v2\/comments?post=524"}],"version-history":[{"count":0,"href":"https:\/\/cephas.net\/blog\/wp-json\/wp\/v2\/posts\/524\/revisions"}],"wp:attachment":[{"href":"https:\/\/cephas.net\/blog\/wp-json\/wp\/v2\/media?parent=524"}],"wp:term":[{"taxonomy":"category","embeddable":true,"href":"https:\/\/cephas.net\/blog\/wp-json\/wp\/v2\/categories?post=524"},{"taxonomy":"post_tag","embeddable":true,"href":"https:\/\/cephas.net\/blog\/wp-json\/wp\/v2\/tags?post=524"}],"curies":[{"name":"wp","href":"https:\/\/api.w.org\/{rel}","templated":true}]}}