Search-Enable Your Application with Lucene

Reading this month’s Java Developers Journal while exercising today, specifically, the article titled “Search-Enable Your Application with Lucene“. Back a couple months ago when I first added Lucene searching to this site, I thought it would have been a great feature to be able to index a URL. So, for example, when creating and updating an index of files in directory on the file system, you’d do something like this:

IndexWriter writer = new IndexWriter(“index”, new StandardAnalyzer(), true);
File file = new File(“c:\htmlToIndex”);
String[] files = file.list();
for (int i = 0; i Verity Spidering. Very nice! So I guess the same code I mentioned above could be done from the command line like so:

c:\cfusionmx\lib\_nti40\bin\vspider -common c:\cfusionmx\lib\common -collection c:\new -start http://www.mysite.com/products/? -indinclude *

But one of the advantages that Lucene has over a product like Verity is the ability one has to customize indexing and searching routines. For instance, one of the examples the author(Craig Walls) gave was the ability to add synonym-matching capability in your indexing routine. Basically, in Lucene, if you want add synonyms to keywords, you subclass TokenFilter, by writing a short bit of code (he provided an example in the source code) and you’re done. To the best of my knowledge, you can’t do that with Verity. Correction: you can’t “extend” Verity… but it comes with a simliar feature to the above mentioned ‘synonym’ feature called “THESAURUS” (“Expands the search to include the word that you enter and its synonyms”). I’ve not spent much time with Verity, but the evidence operators on the CFMX docs page are really intriguing, specifically the “THESAURUS”, “SOUNDEX” and “TYPO/N” evidence operators.

Notes from my .NET/ASP.NET reading

Notes from my .NET/ASP.NET reading back in September…

ildasm — command line tool for viewing manifest file contents

wincv — allows you to quickly look up information about a class or series of classes, based on a search pattern

All aspx pages are compiled to a subdirectory of the .NET framework folder, you can change the path to the compiled files by editing machine.config

compilers:
csc for C#
vbc for Visual Basic .NET
jsc for JScript .NET

pattern recognizers

Scoble posted a great analogy 2 days ago which was ignored amidst the religious context. He said this:

“Our brains are extraordinary pattern recognizers (think about it sometime — why can you look at a tree and instantly recognize it as a tree?). Our brains totally freak out when presented with something that has no pattern. Hey, look at the white noise on your TV sometime. You’ll start seeing patterns. You brain HATES not being able to see patterns.”

10 minutes ago I was trying to figure out why I’m so frustrated with the information architecture of the site I’m developing at work and the idea of the brain as a pattern recognizer helps makes perfect sense of that frustration. The client (in this case represented by anywhere from 1 to 8 people of a 1200 person company) has decided that they want a 3 column layout AND a 2 column layout and a single column layout… each with different navigation schemes, and with no apparent order. This frustrates me to no end, but there’s not much I can do or say to change their mind.

Now with 50% less caffeine!