Indexing Database Content with Lucene & ColdFusion
Terry emailed me a couple days ago wondering how he could use ColdFusion and Lucene to index and then search a database table. Since we’re completely socked in here in Boston, I had nothing better to do today that hack together a quick snippet that does just that:
<cfset an = CreateObject("java", "org.apache.lucene.analysis.StopAnalyzer")>
<cfset an.init()>
<cfset writer = CreateObject("java", "org.apache.lucene.index.IndexWriter")>
<cfset writer.init("C:\mysite\index\", an, "true")>
<cfquery name="contentIndex" datasource="sample">
select label, description, id
FROM product
</cfquery>
<cfloop query="contentIndex">
<cfset d = CreateObject("java", "org.apache.lucene.document.Document")>
<cfset fld = CreateObject("java", "org.apache.lucene.document.Field")>
<cfset content = contentIndex.description>
<cfset title = contentIndex.label>
<cfset urlpath = "/products/detail.cfm?id=" & contentIndex.id>
<cfset d.add(fld.Keyword("url", urlpath))>
<cfset d.add(fld.Text("title", title))>
<cfset d.add(fld.UnIndexed("summary", content))>
<cfset d.add(fld.UnStored("body", content))>
<cfset writer.addDocument(doc)>
</cfloop>
<cfset writer.close()>
The only real change from the code that I wrote previously to index a document was that instead of looping over the file system looking for documents, I loop over a query and then indexed the text of a column from the database rather than the text of a document. (I would have written in in CFScript, but you can’t do queries from CFScript yet, unless you use a UDF to do the query)
You can download the source here, if you’re so inclined.
9 Comments