Thanks to Phil for sending me a link to the Verity Spider tips & tricks on daemon.com.au. Daemon is/was a big Spectra shop and probably used the spider to search Spectra sites on a regular basis. So why doesn’t that page show up in a google search for “verity spider” or “verity spider tips“? Maybe it’s because of the way their content management system works, where each page is denoted by a CF UUID appended to the URL. This method probably helps the developers, but in the long run, isn’t so good for getting ranked or even indexed by the larger search engines… which led me to todays’ research: mod_rewrite. I got my MCSE from Microsoft back a couple years ago, so my first exposure to web servers was IIS. IIS was then and for the most part, is now very pointy clicky (although I’ve heard that .NET IIS will have a text-based configuration file). Anyway, Apache wasn’t something I played with much until the last year, when I brought up a couple linux machines and thus Apache. So today I dove headfirst into mod_rewrite and came up a solution for making the next version (due out anyday now) of karensrecipes.com more search engine friendly. In short, to get to a recipe on the development site right now, you’d type in something like this:
http://www.karensrecipes.com/recipes/detail.jsp?r=18
Again, just like the link I mentioned above, this is not an example of how to impress the search engines. Some kung foo regular expressions and a dab of JKMount knowledge and we now get something like this:
http://www.karensrecipes.com/recipes/18/Steamed_Mussels.jsp
and in your Apache httpd.conf:
RewriteEngine on
RewriteRule ^/recipes/([0-9]+)/.*$ /recipes/detail.jsp?r=$1 [PT]
which in English says something like “if the request starts with ‘/recipe/’ and then is followed by any number of digits and then is followed by a ‘/’ and any number of other characters, then rewrite the URL to this… (wanna know more about regular expressions? get this fabulous book!)
Pretty snazzy eh? It gives me warm feelings inside because my JSP/Servlet code doesn’t have any knowledge that funny stuff is being done to the URL in Apache, which means you can do all sorts of chicanery to your URL without having to change a lick of server side code.