Verity Spider tips & tricks

Thanks to Phil for sending me a link to the Verity Spider tips & tricks on daemon.com.au. Daemon is/was a big Spectra shop and probably used the spider to search Spectra sites on a regular basis. So why doesn’t that page show up in a google search for “verity spider” or “verity spider tips“? Maybe it’s because of the way their content management system works, where each page is denoted by a CF UUID appended to the URL. This method probably helps the developers, but in the long run, isn’t so good for getting ranked or even indexed by the larger search engines… which led me to todays’ research: mod_rewrite. I got my MCSE from Microsoft back a couple years ago, so my first exposure to web servers was IIS. IIS was then and for the most part, is now very pointy clicky (although I’ve heard that .NET IIS will have a text-based configuration file). Anyway, Apache wasn’t something I played with much until the last year, when I brought up a couple linux machines and thus Apache. So today I dove headfirst into mod_rewrite and came up a solution for making the next version (due out anyday now) of karensrecipes.com more search engine friendly. In short, to get to a recipe on the development site right now, you’d type in something like this:

http://www.karensrecipes.com/recipes/detail.jsp?r=18

Again, just like the link I mentioned above, this is not an example of how to impress the search engines. Some kung foo regular expressions and a dab of JKMount knowledge and we now get something like this:

http://www.karensrecipes.com/recipes/18/Steamed_Mussels.jsp

and in your Apache httpd.conf:

RewriteEngine on
RewriteRule ^/recipes/([0-9]+)/.*$ /recipes/detail.jsp?r=$1 [PT]

which in English says something like “if the request starts with ‘/recipe/’ and then is followed by any number of digits and then is followed by a ‘/’ and any number of other characters, then rewrite the URL to this… (wanna know more about regular expressions? get this fabulous book!)

Pretty snazzy eh? It gives me warm feelings inside because my JSP/Servlet code doesn’t have any knowledge that funny stuff is being done to the URL in Apache, which means you can do all sorts of chicanery to your URL without having to change a lick of server side code.

2 thoughts on “Verity Spider tips & tricks”

  1. Hi Aaron,

    I’m also trying to combine JkMount and RewriteRule
    with passthrough [PT], but I’m finding that client sessions get lost. The cookie information is probably not passed to the final request.

    Have you noticed this problem too? Can you think of any ways around it?

  2. Interestingly we never used Verity Spider for Spectra sites — we’d search the database with Verity instead of the generated pages. I came across the Spider issue when I was preparing a talk for MM DevCon.

    We recently made use of some servlet technology to build friendly URLs for our website 🙂 It’s a new feature of the latest cut of FarCry CMS Opensource (http://farcry.daemon.com.au/) — and well worth a look. The Friendly URL servlet can be used in any CFMX app — more details at http://www.spike.org.uk/

Leave a Reply

Your email address will not be published. Required fields are marked *