Category Archives: Clearspace

Clearspace, Lucene, Software Development

How MoreLikeThis Works in Lucene

March 30, 2008 ajohnson 33 Comments

We created a ‘related items’ feature way back in Clearspace 1.0 (I mocked some of it out here just to prove that it worked) which shows related content based on the document, thread or blog post that you’re currently viewing. It was built using the MoreLikeThis class, a contribution made to the Lucene project by David Spencer, Mark Harwood and my favorite Canadian coworker Bruce Ritchie.

The really interesting thing about MoreLikeThis (at least to me) is that my first inclination was probably to think that Lucene goes and spends some cycles looking for content related to the source document: the focus being on the search and the other content. The reality is that the majority of the work is done by analyzing content the source document relative to the aggregate of all the other content that exists in the index. Said another way, MoreLikeThis doesn’t work by running some special search or query, it works by comparing the document that you’re asking about and the entire index as a whole (and then by running a query).

Anyway, in general I think it works pretty well, but for some odd reason, people keep asking the question “how does it work?” and since the documentation doesn’t go into much detail, I thought I’d try writing it up. So here goes nothing…

The first thing you need to do if you’re going to find related content is to tell Lucene what you want to find related content for. The MoreLikeThis class gives you five options: you can either tell it to look at an existing document in the Lucene index or let it try to parse an external resource from a file, an inputstream, a reader or a URL. In Clearspace we actually have a way of getting the Lucene document ID given the Clearspace object type and object ID, the code for our ‘related items’ feature looks something like this:

Query q = buildObjectTypeAndIDQuery(String.valueOf(messageID), JiveConstants.MESSAGE);
Hits hits = searcher.search(q);
...
int docNum = hits.id(0);
MoreLikeThis mlt = new MoreLikeThis(reader);
mlt.setFieldNames(new String[]{IndexField.subject.toString(), IndexField.body.toString(),
                    IndexField.tagID.toString()});
mlt.setMinWordLen(2);
mlt.setBoost(true);
q = mlt.like(docNum);

So in Clearspace we first fetch the Lucene document, construct a MoreLikeThis instance, tell it that we want to match on the subject, body and tag fields and we only want to calculate ‘relatedness’ based on words whose length is two characters or more. That’s the easy part, now it gets interesting.

Given that you already have a Lucene document, the MoreLikeThis instance loops over the field names (fields are kind of like the Lucene equivalent of database columns) that we specified (or all the field names available in the Lucene index if you don’t specify any) and retrieves a term vector for each of the fields in the document we’re analyzing. A term vector is a data structure that holds a list of all the words that were in the field and the number of times each word was used (excluding words that it considers to be ‘stop words’, you can see a list of common stop words here).

Since I’m sure some of you are tuning out at this point, let’s look at something concrete: assume you’ve created a document that looks like this (by the way, what a thorough wikipedia.org entry!):

Subject: Twinkle Twinkle Little Star
Body: Twinkle Twinkle little star, How I wonder what you are! Up above the world so high, Like a diamond in the sky, Twinkle twinkle little star, How I wonder what you are!

The term vector for the subject would look like this:
terms: twinkle, little, star
termFrequencies: twinkle[2], little[1], star[1]

and for the body:
terms: [above, diamond, high, how, like, little, sky, so, star, twinkle, up, what, wonder, world]
termFrequencies: above[1], diamond[1], high[1], how[2], like[1], little[2], sky[1], so[1], star[2], [twinkle[4], up[1], what[2], wonder[2], world[1]

Meanwhile, back at the ranch.. so we get a term vector for each of the fields available in the document we’re looking for related items for. After we’ve got those, each of the term vectors is merged into a map: the key being the term and the value being the number of times the word was used in the document. The map is then handed to a method (called createQueue) that calculates a reasonably complex score for each word in the map. Since this is really the meat of the entire class, I’ll step through this in a little more detail and I’ll season the meat with data from the index on my blog.

The createQueue method first retrieves the total number of documents that exist in the index that we’re dealing with.

int numDocs = ir.numDocs();

In the index I maintain for my personal blog, there are 998 documents in the index. The second thing it does is create an instance of a class called FreqQ, which extends the PriorityQueue class. This object will maintain an object array whose elements are ordered according to their score, which we’ll cover in a second.

Now the createQueue method iterates over each word in the term frequency map, throwing out words if they don’t occur enough times (see setMinTermFreq) and then testing to find out which field across the entire the Lucene index contains the term the most. Next it calculates the Inverse Document Frequency, which, according to the JavaDoc, is:

… a score factor based on a term’s document frequency (the number of documents which contain the term)… Terms that occur in fewer documents are better indicators of topic, so implementations of this method usually return larger values for rare terms, and smaller values for common terms.

and finally, a score, which is a product of the IDF score and the number of times the word existed in the source document.

Again, I’m guessing you’re getting really sleepy, here’s some real data to look at. Like I mentioned above, I have 998 documents in my Lucene index and if we take a document like this one, you’ll end up with a term vector that looks like this:

a[20], you[20], pre[18], the[17], to[13], of[11], this[10], username[10], column[9], oracle[9], and[8], if[8], null[8], alter[7], appuser[7], into[7], on[7], that[7], href[6], http[6], i[6], sql[6], an[5], com[5], is[5], not[5], password[5], table[5], values[5], varchar[5], www[5], be[4], but[4], case[4], clearspace[4], empty[4], get[4], insert[4], modify[4], which[4], administrator[3], are[3], can[3], database[3], db[3], in[3], jivesoftware[3], mysql[3], nullability[3], server[3], string[3], t[3], thing[3], with[3], about[2], all[2], assume[2], been[2], blogged[2], d[2], different[2], feature[2], for[2], from[2], has[2], have[2], hsqldb[2], jive[2], make[2], modified[2], nvarchar[2], out[2], plan[2], postgres[2], products[2], ran[2], requirements[2], sensitivity[2], so[2], space[2], store[2], strike[2], support[2], where[2], will[2], above[1], again[1], against[1], already[1], andrew[1], any[1], appears[1], application[1], at[1], attempt[1], bennett[1], bit[1], blog[1], borrowed[1], both[1], bottom[1], by[1], cannot[1], changing[1], chars[1], classes[1], cmu[1], code[1], columns[1], consider[1], contrib[1], converted[1], cool[1], couple[1], crazy[1], day[1], decided[1], default[1], detail[1], didn[1], discuss[1], do[1], doing[1], edu[1], error[1], example[1], finally[1], first[1], flushed[1], forums[1], further[1], going[1], good[1], goodness[1], heads[1], helpful[1], his[1], insensitive[1], instead[1], interesting[1], involved[1], issue[1], issues[1], it[1], itself[1], jist[1], job[1], jsp[1], kb[1], kind[1], lately[1], least[1], like[1], line[1], ll[1], look[1], looks[1], lot[1], mcelwee[1], microsoft[1], more[1], my[1], needed[1], nice[1], no[1], notice[1], nugget[1], number[1], only[1], or[1], ora[1], over[1], page[1], past[1], platforms[1], probably[1], product[1], re[1], readers[1], reading[1], reason[1], respect[1], results[1], retrieve[1], ridiculous[1], sample[1], second[1], see[1], select[1], semicolon[1], sensitive[1], servers[1], shadow[1], similar[1], six[1], software[1], something[1], specific[1], specification[1], specify[1], standard[1], statement[1], supporting[1], sure[1], system[1], tails[1], than[1], then[1], there[1], they[1], things[1], those[1], thought[1], thunderguy[1], tolowercase[1], touppercase[1], tried[1], try[1], two[1], txt[1], unless[1], ups[1], used[1], user[1], variety[1], ve[1], very[1], want[1], warned[1], was[1], we[1], week[1], whatever[1], wonderful[1], word[1], work[1], write[1], yes[1], your[1], yourself[1]

and stop words would get stripped out and words that occurred less than two times would get stripped out so you’d be left with a term frequency map that looked like this:

pre[18], username[10], column[9], oracle[9], alter[7], appuser[7], href[6], http[6], sql[6], com[5], password[5], table[5], values[5], varchar[5], www[5], case[4], clearspace[4], empty[4], get[4], insert[4], modify[4], which[4], administrator[3], can[3], database[3], db[3], jivesoftware[3], mysql[3], nullability[3], server[3], string[3], thing[3], about[2], all[2], assume[2], been[2], blogged[2], d[2], different[2], feature[2], from[2], has[2], have[2], hsqldb[2], jive[2], make[2], modified[2], nvarchar[2], out[2], plan[2], postgres[2], products[2], ran[2], requirements[2], sensitivity[2], so[2], space[2], store[2], strike[2], support[2], where[2],

So for each of these terms, Lucene finds the field that contains the most instances of the given term and then calculates the idf value and the score. The default implementation of the idf value looks like this:

return (float)(Math.log(numDocs/(double)(docFreq+1)) + 1.0)

So given the term frequency map above and the document frequency noted below, we get the following idf values and scores for the most popular terms in this document:

Term	Number of Instances of Term in Document	Number of Documents Matching Term	IDF value	Score
pre	18	26	4.609916	82.978
username	10	23	4.7276993	47.276
column	9	13	5.266696	47.400264
oracle	9	8	5.7085285	51.376
alter	7	1	7.212606	50.488

Finally, after all the terms have been added to the priority queue, we create a Lucene query, looping over the first 25 terms (this is the default and can be changed via setMaxQueryTerms) in the queue

TermQuery tq = new TermQuery(new Term("body", "oracle"));

and optionally boosting each term according to the score:

tq.setBoost(51.376 / 82.978);

The resulting Lucene query (in string format) looks something like this:

body:pre body:username^.56974 body:column^.57123 body:oracle^.61915 ...

Now I’ll be curious to see what the related item is for this post!

I hope you learned as much as I did. If you’re extremely curious about this whole Lucene thing and you like the in-depth stuff, you should definitely check out Luke, it’ll open a whole new world to you.

Clearspace, J2EE, JavaScript, Rich Internet Applications, Software Development, work

Creating a Firefox Sidebar for Clearspace: Part II

December 1, 2007 ajohnson 1 Comment

It looks like it was almost 2 months ago that I wrote a blog post about the Clearspace plugin for Firefox (called Clearfox), promising that I would follow up with the details on the JavaScript side of the project. I guess time flies when you’re having fun.

Getting started on the JavaScript sidebar was easy. The Mozilla folks have a nice document here that shows you how to get a really simple sidebar created, but like a lot of things in software, the last 20% of the features take 80% of the time. It’s also worth noting that Firefox extensions are deployed as an XPI file, themselves nothing more than glorified zip files, so if you’re curious about how an extension (say Firebug, LiveHTTPHeaders or Del.icio.us Bookmarks), you can download the extension, unzip it’s contents and then poke around to your hearts content. It’s just like viewing source on HTML, which is something that other browser extensions don’t offer out of the box. Here are couple things that either weren’t documented in the above document or caused me to repeatedly bang my head against the wall.

Adding Your Icon

When I originally wrote about the plug-in, a number of people asked “why?” I think one of most important things about browser plug-ins is that they bring your web application (in this case Clearspace) front and center in your customers everyday browsing experience. Front and center in Firefox means that your plug-in sits right next to the Back | Forward | Reload | Stop | Home button in the navigation bar. If you want to put a 24×24 picture of yourself in that spot, be my guest. Since the purpose of Clearfox was to get embed Clearspace in the browser, I chose (wisely I think) to use the Clearspace logo. This was one of the simpler things to accomplish. I added the following element to the overlay.xul:

<toolbox id="navigator-toolbox">
    <toolbarpalette id="BrowserToolbarPalette">
      <toolbarbutton id="clearfox-button" class="clearfoxbutton-1 chromeclass-toolbar-additional"
                     observes="viewClearfoxSidebar" />
    </toolbarpalette>
  </toolbox>

and then in a CSS file (that you also reference in overlay.xul), I added this:

#clearfox-button {
  list-style-image: url("chrome://clearfox/skin/clearfox.png");
  -moz-image-region: rect(0px 24px 24px 0px);
}

Finally, you’ll need to have execute some JavaScript when Firefox loads:

var toolbox = document.getElementById("navigator-toolbox");
var toolboxDocument = toolbox.ownerDocument;
var hasClearfoxButton = false;
for (var i = 0; i < toolbox.childNodes.length; ++i) {
  var toolbar = toolbox.childNodes[i];
  if (toolbar.localName == "toolbar" && toolbar.getAttribute("customizable") == "true" ) {
    if (toolbar.currentSet.indexOf("clearfox-button") > -1)
      hasClearfoxButton = true;
  }
}
if (!hasClearfoxButton) {
  for (var i = 0; i < toolbox.childNodes.length; ++i) {
    toolbar = toolbox.childNodes[i];
    if (toolbar.localName == "toolbar" &&  toolbar.getAttribute("customizable") == "true"
      && toolbar.id == "nav-bar") {
      var newSet = "";
      var child = toolbar.firstChild;
      while (child) {
        if (!hasClearfoxButton && (child.id=="clearfox-button" || child.id=="urlbar-container")) {
          newSet += "clearfox-button,";
          hasClearfoxButton = true;
        }
        newSet += child.id + ",";
        child = child.nextSibling;
      }
      newSet = newSet.substring(0, newSet.length-1);
      toolbar.currentSet = newSet;
      toolbar.setAttribute("currentset", newSet);
      toolboxDocument.persist(toolbar.id, "currentset");
      BrowserToolboxCustomizeDone(true)
      break;
    }
  }
}

The key takeaways: the ID of the toolbarbutton element is used in the CSS declaration, the list-style-image property is used to specify the image you want for the button and the observes attribute points to the ID of the broadcaster element, which is used for showing / hiding the sidebar. I'm not all that good with Fireworks / Photoshop so I didn't go the extra mile to create a separate image to show when a user mouses over the button (check out the the excellent del.icio.us bookmarks extension for an example), but adding a mouseover / hover image is as simple as adding another CSS property:

clearfox-button:hover {
  list-style-image: url("chrome://clearfox/skin/clearfox-hover.png");
}

Loading Sidebar Content

I think I mentioned this when I wrote the original post, but the thing that really got me started with the Firefox extension was looking at the source for the twitbin Firefox extension. When I checked out the source for that extension, I was stunned to learn that they were simply loading up an HTML page from their server and then refreshing the page every couple minutes using an AJAX request. I thought it must have been way more complex than that, but HTML + JavaScript + CSS with a little bit of XUL sprinkled in and you're golden. In the broadcaster element (mentioned above), you add an attribute 'oncommand' that tells Firefox what JavaScript method(s) you want invoked when the user clicks on your button; in Clearfox I use that hook to load the HTML content from Clearspace. The JavaScript looks like this:

var sidebar = top.document.getElementById("sidebar");
sidebar.contentWindow.addEventListener("DOMContentLoaded",
  Clearfox.clearfoxContentLoaded, false);
sidebar.loadURI('http://example.com/yourpage.html');

DOMContentLoaded

Once the content has loaded the DOMContentLoaded event is fired and then Clearfox runs the clearfoxContentLoaded method. I initially tried loading the page and then re-loading that same page every couple minutes: I was cheating and trying not have to do the AJAX part. I added a meta refresh tag to the page to have it reload every n minutes, but the DOMContentLoaded event was only fired the first time the page was loaded. Key takeaway: if you rely on DOMContentLoaded, you're only going to get the event once, even if your page reloads.

Opening New Tabs

You can run your own little application over in the sidebar if you want to, never opening new tabs or doing anything in the main window, but the main point of the Clearfox extension was to give users the ability to see new content in Clearspace and then able to view the full thread, document or blog post in their main browser window as a new tab. There's no way you can create new tabs in JavaScript outside of XUL, you have to be inside XUL to create a tab so the DOMContentLoaded event is important because this is where the links are all rewritten to open new tabs rather than work as normal links. There are two parts to the clearfoxContentLoaded method: the first rewrites all the links so that clicking on a link in the sidebar opens a new tab, the second adds listeners to the document so that when new content is added via AJAX, the same first part is repeated. The link conversion looks like this:

var sidebar = top.document.getElementById("sidebar");
var doc = sidebar.contentDocument;
var all_links = new Array();
var links = doc.evaluate("//a", doc, null, XPathResult.ANY_TYPE, null);
var link = links.iterateNext();
while (link) {
  all_links.push(link);
  link = links.iterateNext();
}
for (var i = 0; i < all_links.length; i++) {
  link = all_links[i];
  if (!link.hasAttribute('onclick') &&
    link.hasAttribute('href') &&
    (link.hasAttribute('class') && link.getAttribute('class').indexOf('tabbable') > -1)
  ) {
    var target = link.getAttribute('href');
    link.setAttribute('onclick', 'return false;');
    link.removeAttribute('target');
    link.addEventListener('click', Clearfox.clearfoxCreateOpenFunction(target), true);
  }
}

In English, get all the links that don't have an onclick attribute and add an onclick event, whose callback creates a new tab:

clearfoxCreateOpenFunction: function(url) {
  return function() {
    gBrowser.selectedTab = gBrowser.addTab(url);
  };
}

The second part of the clearfoxContentLoaded method listens for any modifications (anytime something is added to the document via an AJAX call, I remove the last n rows, so in this case I listen for the removal of nodes) to the document and adds handlers for the modification event:

var sidebar = top.document.getElementById("sidebar");
var table = sidebar.contentDocument.getElementById("clearfox-hidden-content");
table.addEventListener("DOMNodeRemoved", Clearfox.clearfoxConvertLinks, false);

Debugging

Two things about debugging your extension: a) none of the tools you've got installed for debugging JavaScript / CSS (cough! Firebug cough!) will work in the sidebar window and b) your only other option is to resort to the Firefox equivalent of Java's System.out.println:

function logClearfoxMsg(message) {
  var consoleService = Components.classes["@mozilla.org/consoleservice;1"].getService(Components.interfaces.nsIConsoleService);
  consoleService.logStringMessage("Clearfox: " + message);
}

and then to log a message:

logClearfoxMsg('your advertisement here');

To see the log message, go to Tools --> Error Console.

Showing The Sidebar When Firefox Opens

I spent way more time than I should have working on this last part: when you first install the extension, you're asked to restart Firefox, which you do and then you see the nice Clearspace button, you click on it and the sidebar opens and you're happy. Then a couple days later when you need to restart Firefox again (for whatever reason), you'll notice that the sidebar is open but that no content shows and you're sad. You probably didn't take it personally, but I did and I wanted to figure out why Firefox wouldn't load up the content if the sidebar was already open. There actually is onload attribute that I tried using in the sidebar.xul and that didn't work for reasons I won't get into here. What ended up doing the trick for me (and I really think it's a trick but I couldn't get anything else to work and this was just about the last option I had) was to check to see if the sidebar was open when Firefox was loading:

var broadcaster = top.document.getElementById('viewClearfoxSidebar');
  if (broadcaster.hasAttribute('checked')) {
    ...

and then, if it is open, *forcing* the sidebar to open again:

toggleSidebar('viewClearfoxSidebar', true);

For whatever reason, this was the only way I could get the rest of my code to work, but work it did. And now I'm happy.

And I hope you are too. If you have any questions about creating a Firefox sidebar, shoot me an email, I'll be glad to help. If you want to see all the code, you can download the Clearfox source over on the Jive Software Community site. All the JavaScript / XUL code I discussed in this post is in the 'xpi' directory off the root.

Clearspace, J2EE, Open Source, Software Development, WebWork, work

Creating a Firefox Sidebar for Clearspace: Part I

October 2, 2007 ajohnson 1 Comment

It’s been embarassingly quiet on this blog of late, I apologize for all the delicious links, although a case could be made that blogs were originally nothing more than sharing links so maybe I shouldn’t be apologizing, but that’s a different blog post. Today I want to talk about the thing I’ve been working on at night, my non-day job if you will. A couple months ago I was reading all the hype about how JavaScript is going to take over the world and I’d been doing a lot of JavaScript during the day but I needed a project to get me through the night. Right about that time was when Twitter started taking off and I came across twitbin, which is a cool Firefox sidebar that shows you all of your friends tweets in a Firefox sidebar (the same sidebar that livehttpheaders and selenium IDE show up in), updated in real time. Like any good hacker, I wondered “how’d they do that” and started poking around the xpi file that you download to install Firefox extensions. Lo and behold, it’s JavaScript and HTML behind the scenes. Since Clearspace is one of those addictive, constantly updating, can’t get enough of it kind of applications (unlike Twitter you can actually use more than 140 characters! amazing!), I thought it would be both useful and potentially easy to create a sidebar for Clearspace, which brings me to this blog post.

I’m not sure where I started, but I’m pretty sure the first step wasn’t to create a plugin in Clearspace, I messed around for awhile with the technologies that go into creating a Firefox Extension: JavaScript, XUL (pronounced ‘zool’), install manifests, etc. I got a sample Firefox sidebar up and running that and spent way more time than I should have installing, viewing, uninstalling and restarting Firefox than I should have. I spent a lot of time digesting these sites:

I’ll discuss some of the things I ran into on the Firefox / JavaScript side in a second blog post: right now I want to talk about the Clearspace side of the plugin.

Eventually, I got to a place where I was comfortable enough with the Firefox side of things to creating the Clearspace part of the plugin. The content that is displayed in the sidebar is nothing more than plain HTML and CSS. If you install the Clearspace plugin you can actually view the content in any browser: go to http://example.com/clearspace/cf-view.jspa (replacing example.com with the host name of your installation). You’ll see that the content looks surprisingly similiar to the content that shows up on the the homepage of Clearspace, and in fact, it is the same content. The one difference between this page and other pages in Clearspace is that it never does a page reload. All the views (the login page, the settings page, the view page, etc.) are included in the resulting HTML, but hidden using CSS so that you only see one view at a time. When you click a button at the top of the page, two things usually happen: a) the current view is hidden using CSS and the new view is displayed using CSS and b) an AJAX request is sent out to a WebWork action that performs some action: logging you out, getting the updated content since your last time you viewed the page, saving your Clearfox settings, etc. This process might seem a little backwards: why not just work the same way a regular web page browsing session works where you click a link and your browser loads another page? It’s actually a limitation imposed by the Firefox extension model, so I won’t go into it here, but if you’re curious, you can do a search for ‘DOMContentLoaded firefox extension’.

So now that you (hopefully) understand how the client works, I’m going to assume that you won’t have any problems copying the ‘example’ plugin that Clearspace ships with and dive right into the pieces that are distinctive about the Clearspace part of the Clearfox plugin.

First, I needed to define the WebWork actions that Clearfox was going to use, which means I needed to create the xwork-plugin.xml file. The descriptor for the view that I mentioned earlier looks like this:

<action name="cf-view"
   class="com.jivesoftware.clearspace.plugin.clearfoxplugin.ViewAction">
   <result name="success">/plugins/clearfox/resources/view.ftl</result>
   <result name="update">/plugins/clearfox/resources/update.ftl</result>
</action>

That action is pretty standard: there are two possible results: the ‘success’ result shows the list of the 25 most recently updated pieces of content, the update result is used by an AJAX request to update the existing page with the n most recently updated pieces of content since the last refresh (which by default is invoked every 5 minute). The ViewAction class extends com.jivesoftware.community.action.MainAction, which is the WebWork action that handles the display of the homepage of Clearspace, so ViewAction simply invokes

super.execute();

to get the same content you’d get if you viewed the homepage. When the Clearfox plugin refreshes every 5 or so minutes it invokes the

public String doUpdate()

method, passing in the time (in milliseconds) that the last piece of content it has a record of was updated. It gets back a (presumably) shorter list of content that it then updates the view with.

The login, logout and settings actions are all using in combination with AJAX and since none of them need to provide any data, they all return HTTP status code headers:

<action name="cf-login"
   class="com.jivesoftware.clearspace.plugin.clearfoxplugin.LoginAction">
   <result name="success" type="httpheader">
      <param name="status">200</param>
   </result>
   <result name="unauth" type="httpheader">
      <param name="status">401</param>
   </result>
</action>
<action name="cf-logout"
   class="com.jivesoftware.clearspace.plugin.clearfoxplugin.LogoutAction">
   <result name="success" type="httpheader">
   <param name="status">200</param>
   </result>
</action>
<action name="cf-settings"
   class="com.jivesoftware.clearspace.plugin.clearfoxplugin.SettingsAction">
   <result name="success" type="httpheader">
      <param name="status">200</param>
   </result>
</action>

The last action is used by the Firefox Extension framework to determine if the plugin (on the Firefox side) needs to be updated. The action descriptor looks like this:

<action name="cf-updater"
   class="com.jivesoftware.clearspace.plugin.clearfoxplugin.UpdaterAction">
   <result name="success">
      <param name="location">/plugins/clearfox/resources/updater.ftl</param>
      <param name="contentType">text/rdf</param>
   </result>
</action>

The one trick about this one is that the result sets an HTTP contentType header, which isn’t remarkable except that Clearspace uses Sitemesh, which attempts to decorate everything it can parse with a header and footer. The contentType header should be a hint to Sitemesh that you don’t want the result to be decorated / wrapped with a header and footer, but apparently the hint is to subtle for Sitemesh because it attempts to wrap the result of this action anyway, which leads to the second distinctive part of this plugin.

The way we bypass Sitemesh decoration in the core product is by modifying a file called templates.xml, which gives us the ability tell Sitemesh to exclude certain paths from being decorated. In Clearspace 1.7, which will be out in a couple weeks, plugins (which can’t modify templates.xml) will have the ability to tell Sitemesh about paths that they don’t want decorated. Hence the following entry in the plugin descriptor (always located at the root of plugin and named plugin.xml):

<sitemesh>
   <excludes>
      <pattern>/cf-updater.jsp*</pattern>
      </excludes>
</sitemesh>

Third, Another thing you may have noticed if you were following along with the source code (which is available from clearspace.jivesoftware.com) is that two of the action classes (ViewAction and LoginAction if you must know) are marked with the class annotation ‘AlwaysAllowAnonymous’. This annotation tells the Clearspace security interceptor that the anonymous users should be allowed to invoke the action without being logged in. This is probably a good place to remind you of one of the differences between Clearspace and ClearspaceX. Clearspace, by default, is configured to *require* users to login before they can see any content while ClearspaceX uses the opposite default: you only need to login (usually) if you want to post content. So back to the AlwaysAllowAnonymous annotation: it’s important mostly for the Clearspace (not ClearspaceX) implementations because you want to give people the ability to invoke the action and then the action itself handles the display of the login page in it’s own specific way. The RSS related actions in Clearspace work exactly the same way: they are all annotated with the AlwaysAllowAnonymous marker and then handle security via HTTP Basic Auth (Clearspace) or simply allow anonymous usage (ClearspaceX) because feed readers

Fourth, because the Firefox part of the plugin needs to be specific to your installation, all the Firefox plugin related files need to be zipped up to create the XPI file that your users will install into Firefox. The plugin framework gives you the ability to define a plugin class:

<class>com.jivesoftware.clearspace.plugin.clearfoxplugin.ClearFoxPlugin</class>

which implements com.jivesoftware.base.plugin.Plugin. The interface specifies a method:

public void initializePlugin(PluginManager manager, PluginMetaData pluginData);

which means that your plugin will get a chance to initialize itself. The Clearfox plugin uses this initialization hook to zip up the Firefox related files. The plugin is told where it lives:

public void initializePlugin(PluginManager manager, PluginMetaData metaData) {
   File pluginDir = metaData.getPluginDirectory();
   ...

and then goes on to zip up the files located in the xpi directory of the plugin source.

So that’s that… if you got this far you probably don’t care, but I actually did a screencast of the whole thing that lives over here or you can check it out on blip.tv.

Aaron Johnson