QueryParser … in NLucene

Misleading title. I implemented the first of the examples that Erik Hatcher used in his
article about the Lucene QueryParser
, only I used NLucene. Lucene and NLucene are very similar, so if anything, it’s interesting only because it highlights a couple of the differences between C# and Java.

First, here’s the Java example taken directly from Erik’s article:

public static void search(File indexDir, String q) {
  Directory fsDir = FSDirectory.getDirectory(indexDir, false);
  IndexSearcher is = new IndexSearcher(fsDir);
  Query query = QueryParser.parse(q, "contents", new StandardAnalyzer());
  Hits hits = is.search(query);
  System.out.println("Found " hits.length() +
    " document(s) that matched query '" q "':");
  for (int i = 0; i
The NLucene version looks eerily similar:

public static void Search(DirectoryInfo indexDir, string q) {
  DotnetPark.NLucene.Store.Directory fsDir = FsDirectory.GetDirectory(indexDir, false);
  IndexSearcher searcher = new IndexSearcher(fsDir);
  Query query = QueryParser.Parse(q, "contents", new StandardAnalyzer());
  Hits hits = searcher.Search(query);
  Console.WriteLine("Found " + hits.Length +
    " document(s) that matched query '" + q + "':");
  for (int i = 0; i
The differences are mainly syntax.

First, Erik used the variable name 'is' for his IndexSearcher. In C# 'is' is a keyword, so I switched the variable name to 'searcher'. If you're really geeky, you might want to brush up on all the Java keywords and the C# keywords.

Second, while Java uses the File class to describe directories and files, the .NET Framework uses the DirectoryInfo class.

Third, Java programmers are encouraged to capitalize class names and use camel Case notation for method and variable names while C# programmers are encouraged to Pascal notation for methods and camel Case for variables, so I switched the static method name from 'search' to 'Search'.

Next, 'Directory' is a system class, so the reference to the NLucene directory needed to be fully qualified:

DotnetPark.NLucene.Store.Directory fsDir = FsDirectory.GetDirectory(indexDir, false);

rather than this:

Directory fsDir = FsDirectory.GetDirectory(indexDir, false);

Finally, the Hits class contains a couple differences. Java programmers use the length() method on a variety of classes, so it made sense for the Java version to use a length() method as well. C# introduced the idea of a property, which is nothing more than syntactic sweetness that allows the API developer to encapsulate the implementation of a variable, but allow access to it as if it were a public field. The end result is that instead of writing:

for (int i = 0; i
in Java, you'd use this in C#:

for (int i = 0; i
The authors of Lucene also decided to use the C# indexer functionality (which I wrote about a couple days ago) so that an instance of the Hits class can be accessed as if it were an array:

Document doc = hits[i].Document;

I put together a complete sample that you can download and compile yourself if you're interested in using NLucene. Download it here.

7 thoughts on “QueryParser … in NLucene”

  1. I read the posting on your QueryParser implementation in NLucene. I was
    wondering if you wouldn’t mind me asking some questions based on your
    experiences. I’m trying to implement a searching solution for a website that
    I’m developing and I’m sampling NLucene as an option.

    What was your experience with using NLucene? Have you used this before as a
    searching solution for websites you’ve created? How were you able to index
    dynamic aspx content? Where can I find additional resources on set up?

    If you have some free time to answer these questions I would seriously
    appreciate it. Thanks in advance for your time and I hope to hear from you
    soon.

  2. hey Chris,

    >> What was your experience with using NLucene? Have you used this before as a
    searching solution for websites you’ve created?
    — I haven’t used it in production, but I have used Lucene, which NLucene is obviously based on. It’s reasonably fast, stable and provides good search results.

    >> How were you able to index dynamic aspx content?
    — ah… I’m sure this is a popular question. What you’re really asking is ‘can it spider my dynamic site?’ and the answer to that is no. You can do one of three things: a) you can write a spider to traverse your site and then have the spider use NLucene to index the content that it finds (a good solution, but lots of work) or b) you can code that extracts the relevant dynamic data from your database and build an index using NLucene against that (less work, but may not always reflect the content on your site) or c) you can ‘spider’ your file system and just read the contents of the files unrendered by IIS (which probably is not a good option if you have a dynamic website).

    >> Where can I find additional resources on set up?
    — Wow.. I thought there would have been some resources on setting up and using Nlucene on the web.. .not much is there? NLucene is conceptually very similar to Lucene, so you might want to check out the Lucene project for more information on setup, although it’s going to be Java/Lucene specific. I’ll try to get something up on my site soon that talks about setting up and using NLucene.

  3. Pingback: The Mit's Blog
  4. Hi Mr. Tagbo!
    I’ve read a lot of imformation about you. I’m currently implementating a Association Rule Mining Project, i found that Apriori is the most suitable algorithm for me. Can you send me such source code in Vb. Truthly thanks in advance.
    Your websites…i don’t know how to say…i cannot reach them.

Leave a Reply to Kingsley Tagbo Cancel reply

Your email address will not be published. Required fields are marked *