{"id":189,"date":"2002-12-15T22:51:03","date_gmt":"2002-12-16T02:51:03","guid":{"rendered":"http:\/\/wordpress.cephas.net\/?p=189"},"modified":"2002-12-15T22:51:03","modified_gmt":"2002-12-16T02:51:03","slug":"verity-spider-tips-tricks","status":"publish","type":"post","link":"https:\/\/cephas.net\/blog\/2002\/12\/15\/verity-spider-tips-tricks\/","title":{"rendered":"Verity Spider tips &amp; tricks"},"content":{"rendered":"<p>Thanks to <a href=\"http:\/\/rubhub.com\">Phil<\/a> for sending me a link to the Verity Spider <a href=\"http:\/\/www.daemon.com.au\/index.cfm?objectid=64356A77-D0B7-4CD6-F933AE6E7646106F\">tips &amp; tricks<\/a> on <a href=\"http:\/\/www.daemon.com.au\/\">daemon.com.au<\/a>.  Daemon is\/was a big Spectra shop and probably used the spider to search Spectra sites on a regular basis.  So why doesn&#8217;t that page show up in a google search for &#8220;<a href=\"http:\/\/www.google.com\/search?q=verity+spider&amp;btnG=Google+Search&amp;hl=en&amp;lr=&amp;ie=UTF-8&amp;oe=UTF-8\">verity spider<\/a>&#8221; or &#8220;<a href=\"http:\/\/www.google.com\/search?hl=en&amp;lr=&amp;ie=UTF-8&amp;oe=UTF-8&amp;q=verity+spider+tips\">verity spider tips<\/a>&#8220;?  Maybe it&#8217;s because of the way their content management system works, where each page is denoted by a CF UUID appended to the URL.  This method probably helps the developers, but in the long run, isn&#8217;t so good for getting ranked or even indexed by the larger search engines&#8230; which led me to todays&#8217; research: <a href=\"http:\/\/httpd.apache.org\/docs\/mod\/mod_rewrite.html\">mod_rewrite<\/a>.  I got my MCSE from Microsoft back a couple years ago, so my first exposure to web servers was IIS.  IIS was then and for the most part, is now very pointy clicky (although I&#8217;ve heard that .NET IIS will have a text-based configuration file). Anyway, Apache wasn&#8217;t something I played with much until the last year, when I brought up a couple linux machines and thus Apache. So today I dove headfirst into mod_rewrite and came up a solution for making the next version (due out anyday now) of <a href=\"http:\/\/www.karensrecipes.com\/\">karensrecipes.com<\/a> more search engine friendly. In short, to get to a recipe on the development site right now, you&#8217;d type in something like this:<\/p>\n<p>http:\/\/www.karensrecipes.com\/recipes\/detail.jsp?r=18<\/p>\n<p>Again, just like the link I mentioned above, this is not an example of how to impress the search engines.  Some kung foo regular expressions and a dab of JKMount knowledge and we now get something like this:<\/p>\n<p>http:\/\/www.karensrecipes.com\/recipes\/18\/Steamed_Mussels.jsp<\/p>\n<p>and in your Apache httpd.conf:<\/p>\n<p>RewriteEngine on<br \/>\nRewriteRule ^\/recipes\/([0-9]+)\/.*$ \/recipes\/detail.jsp?r=$1 [PT]<\/p>\n<p>which in English says something like &#8220;if the request starts with &#8216;\/recipe\/&#8217; and then is followed by any number of digits and then is followed by a &#8216;\/&#8217; and any number of other characters, then rewrite the URL to this&#8230;  (wanna know more about regular expressions?  get <a href=\"http:\/\/www.amazon.com\/exec\/obidos\/ASIN\/0596002890\/cephasnet-20\/\">this fabulous book<\/a>!)<\/p>\n<p>Pretty snazzy eh? It gives me warm feelings inside because my JSP\/Servlet code doesn&#8217;t have any knowledge that funny stuff is being done to the URL in Apache, which means you can do all sorts of <a href=\"http:\/\/dictionary.reference.com\/search?q=chicanery\">chicanery<\/a> to your URL without having to change a lick of server side code.<\/p>\n","protected":false},"excerpt":{"rendered":"<p>Thanks to Phil for sending me a link to the Verity Spider tips &amp; tricks on daemon.com.au. Daemon is\/was a big Spectra shop and probably used the spider to search Spectra sites on a regular basis. So why doesn&#8217;t that page show up in a google search for &#8220;verity spider&#8221; or &#8220;verity spider tips&#8220;? Maybe &hellip; <a href=\"https:\/\/cephas.net\/blog\/2002\/12\/15\/verity-spider-tips-tricks\/\" class=\"more-link\">Continue reading <span class=\"screen-reader-text\">Verity Spider tips &amp; tricks<\/span> <span class=\"meta-nav\">&rarr;<\/span><\/a><\/p>\n","protected":false},"author":1,"featured_media":0,"comment_status":"open","ping_status":"closed","sticky":false,"template":"","format":"standard","meta":[],"categories":[7,5,3,4,12],"tags":[],"_links":{"self":[{"href":"https:\/\/cephas.net\/blog\/wp-json\/wp\/v2\/posts\/189"}],"collection":[{"href":"https:\/\/cephas.net\/blog\/wp-json\/wp\/v2\/posts"}],"about":[{"href":"https:\/\/cephas.net\/blog\/wp-json\/wp\/v2\/types\/post"}],"author":[{"embeddable":true,"href":"https:\/\/cephas.net\/blog\/wp-json\/wp\/v2\/users\/1"}],"replies":[{"embeddable":true,"href":"https:\/\/cephas.net\/blog\/wp-json\/wp\/v2\/comments?post=189"}],"version-history":[{"count":0,"href":"https:\/\/cephas.net\/blog\/wp-json\/wp\/v2\/posts\/189\/revisions"}],"wp:attachment":[{"href":"https:\/\/cephas.net\/blog\/wp-json\/wp\/v2\/media?parent=189"}],"wp:term":[{"taxonomy":"category","embeddable":true,"href":"https:\/\/cephas.net\/blog\/wp-json\/wp\/v2\/categories?post=189"},{"taxonomy":"post_tag","embeddable":true,"href":"https:\/\/cephas.net\/blog\/wp-json\/wp\/v2\/tags?post=189"}],"curies":[{"name":"wp","href":"https:\/\/api.w.org\/{rel}","templated":true}]}}