{"id":188,"date":"2002-12-14T23:34:08","date_gmt":"2002-12-15T03:34:08","guid":{"rendered":"http:\/\/wordpress.cephas.net\/?p=188"},"modified":"2002-12-14T23:34:08","modified_gmt":"2002-12-15T03:34:08","slug":"search-enable-your-application-with-lucene","status":"publish","type":"post","link":"https:\/\/cephas.net\/blog\/2002\/12\/14\/search-enable-your-application-with-lucene\/","title":{"rendered":"Search-Enable Your Application with Lucene"},"content":{"rendered":"<p>Reading <a href=\"http:\/\/www.sys-con.com\/java\/archivesa.cfm?volume=07&amp;issue=12\">this month&#8217;s<\/a> Java Developers Journal while exercising today, specifically, the article titled &#8220;<a href=\"http:\/\/www.sys-con.com\/java\/article.cfm?id=1777\">Search-Enable Your Application with Lucene<\/a>&#8220;.  <a href=\"http:\/\/cephas.net\/blog\/archives\/000008.html#000008\">Back<\/a> a couple months ago when I first added Lucene searching to this site, I thought it would have been a great feature to be able to index a URL. So, for example, when creating and updating an index of files in directory on the file system, you&#8217;d do something like this:<\/p>\n<p>IndexWriter writer = new IndexWriter(&#8220;index&#8221;, new StandardAnalyzer(), true);<br \/>\nFile file = new File(&#8220;c:\\htmlToIndex&#8221;);<br \/>\nString[] files = file.list();<br \/>\nfor (int i = 0; i Verity Spidering<\/a>.  Very nice!  So I guess the same code I mentioned above could be done from the command line like so:<\/p>\n<p>c:\\cfusionmx\\lib\\_nti40\\bin\\vspider -common c:\\cfusionmx\\lib\\common -collection c:\\new -start http:\/\/www.mysite.com\/products\/? -indinclude *<\/p>\n<p>But one of the advantages that Lucene has over a product like Verity is the ability one has to customize indexing and searching routines.  For instance, one of the examples the author(Craig Walls) gave was the ability to add synonym-matching capability in your indexing routine.  Basically, in Lucene, if you want add synonyms to keywords, you subclass <a href=\"http:\/\/jakarta.apache.org\/lucene\/docs\/api\/org\/apache\/lucene\/analysis\/TokenFilter.html\">TokenFilter<\/a>, by writing a short bit of code (he provided an example in the <a href=\"http:\/\/www.sys-con.com\/java\/source.cfm?id=1777\">source code<\/a>) and you&#8217;re done. To the best of my knowledge, you can&#8217;t do that with Verity. Correction: you can&#8217;t &#8220;extend&#8221; Verity&#8230; but it comes with a simliar feature to the above mentioned &#8216;synonym&#8217; feature called &#8220;THESAURUS&#8221; (&#8220;Expands the search to include the word that you enter and its synonyms&#8221;). I&#8217;ve not spent much time with Verity, but the evidence operators on the <a href=\"http:\/\/livedocs.macromedia.com\/cfmxdocs\/Developing_ColdFusion_MX_Applications_with_CFML\/indexSearch025.jsp\">CFMX docs<\/a> page are really intriguing, specifically the &#8220;THESAURUS&#8221;, &#8220;SOUNDEX&#8221; and &#8220;TYPO\/N&#8221; evidence operators.<\/p>\n","protected":false},"excerpt":{"rendered":"<p>Reading this month&#8217;s Java Developers Journal while exercising today, specifically, the article titled &#8220;Search-Enable Your Application with Lucene&#8220;. Back a couple months ago when I first added Lucene searching to this site, I thought it would have been a great feature to be able to index a URL. So, for example, when creating and updating &hellip; <a href=\"https:\/\/cephas.net\/blog\/2002\/12\/14\/search-enable-your-application-with-lucene\/\" class=\"more-link\">Continue reading <span class=\"screen-reader-text\">Search-Enable Your Application with Lucene<\/span> <span class=\"meta-nav\">&rarr;<\/span><\/a><\/p>\n","protected":false},"author":1,"featured_media":0,"comment_status":"open","ping_status":"closed","sticky":false,"template":"","format":"standard","meta":[],"categories":[3],"tags":[],"_links":{"self":[{"href":"https:\/\/cephas.net\/blog\/wp-json\/wp\/v2\/posts\/188"}],"collection":[{"href":"https:\/\/cephas.net\/blog\/wp-json\/wp\/v2\/posts"}],"about":[{"href":"https:\/\/cephas.net\/blog\/wp-json\/wp\/v2\/types\/post"}],"author":[{"embeddable":true,"href":"https:\/\/cephas.net\/blog\/wp-json\/wp\/v2\/users\/1"}],"replies":[{"embeddable":true,"href":"https:\/\/cephas.net\/blog\/wp-json\/wp\/v2\/comments?post=188"}],"version-history":[{"count":0,"href":"https:\/\/cephas.net\/blog\/wp-json\/wp\/v2\/posts\/188\/revisions"}],"wp:attachment":[{"href":"https:\/\/cephas.net\/blog\/wp-json\/wp\/v2\/media?parent=188"}],"wp:term":[{"taxonomy":"category","embeddable":true,"href":"https:\/\/cephas.net\/blog\/wp-json\/wp\/v2\/categories?post=188"},{"taxonomy":"post_tag","embeddable":true,"href":"https:\/\/cephas.net\/blog\/wp-json\/wp\/v2\/tags?post=188"}],"curies":[{"name":"wp","href":"https:\/\/api.w.org\/{rel}","templated":true}]}}