Aaron Johnson Now with 50% less caffeine!

Posts Tagged Lucene

why create jSearch?

One of the comments posted to the blog entry introducing jSearch asked why I thought it needed to be created when a tool like nutch already exists. nutch is a massive undertaking, it’s aim is to create a spider and search engine capable of spidering, indexing and searching billions of web pages while also providing [...]


Inverted Index

Catching up on links… Matt Quail wrote about Lucene and it’s use of an inverted index a couple months ago and then today John Battelle linked to ‘backrub‘ (google before google existed) which also mentions the use of an inverted index.


Wanted: Extracting summary from HTML text

As part of a project I’m working on I need to extract content from an HTML page, in some sense creating a short 200 character summary of the document. Google does a fantastic job of extracting text and presenting a summary of the document in their search listings, I’m wondering how they do that. [...]


Indexing Database Content with Lucene & ColdFusion

Terry emailed me a couple days ago wondering how he could use ColdFusion and Lucene to index and then search a database table. Since we’re completely socked in here in Boston, I had nothing better to do today that hack together a quick snippet that does just that:

<cfset an = CreateObject(”java”, “org.apache.lucene.analysis.StopAnalyzer”)>
<cfset an.init()>
<cfset writer [...]


QueryParser … in NLucene

Misleading title. I implemented the first of the examples that Erik Hatcher used in his
article about the Lucene QueryParser, only I used NLucene. Lucene and NLucene are very similar, so if anything, it’s interesting only because it highlights a couple of the differences between C# and Java.
First, here’s the Java example [...]


Lucene’s Query API

Erik Hatcher wrote an excellent article on the specifics of Lucene’s Query API, specifically on how the QueryParser class uses the Query subclasses including TermQuery, PhraseQuery, RangeQuery, WildcardQuery, PrefixQuery, FuzzyQuery and BooleanQuery. Very useful stuff.
Not unsurprisingly, he’s also writing a book on Lucene titled “Lucene in Action”, to be published by Manning.


Highlighting search results with Lucene 1.3

Mark (who looks like a pretty smart cookie) has put together some code that gives Lucene 1.3 users the ability to highlight terms in searched documents. You can read more about it here and download the software (zip) here.


ColdFusion Developer’s Journal: Writing a Java-based CFX Tag

The second of my articles in ColdFusion Developer’s Journal is available on the website now: Extending ColdFusion with Java: Writing a Java-based CFX Tag. It’s a follow up on the last article I wrote; this one explains how you can write a Java CFX tag, the example again uses Lucene.


Lucene.Net 1.3.rc1 now available

From lucene-user: Lucene.Net search engine: “Lucene.Net is a complete up to date .NET port of Jackarta Lucene a hight-performance, full-featured text search engine written entirely Java. See http://jakarta.apache.org/lucene for more info on Jakarta Lucene.“


Posted
14 July 2003 @ 12pm

Tagged
Lucene

Lucene Index Browser

From the lucene-user list today: Lucene Index Browser: Luke is a handy development and diagnostic tool, which accesses already existing Lucene indexes and allows you to display their contents in several ways:
· browse by document number, or by term
· view documents / copy to clipboard
· retrieve a ranked list of most frequent terms
· execute a [...]


← Before After →