Category Archives: Open Source

Jabber, Open Source, Robots, Syndication

instantFeeds: real time notification of RSS updates via Openfire

March 21, 2007 ajohnson 1 Comment

I wrote a long winded post a couple months ago with a nebulous title called “IM and RSS: Rome is on Fire” where I talked about the feed bot that I wrote for Wildfire (which is now called Openfire). It’s been up and running now for a couple months, giving me a chance to work out some of the bugs and add a couple features and I think it’s ready for you to look at and use. The project page has all the details.

Also, I updated the client-side sign up form to use Prototype / script.aculo.us rather than the YUI stuff (which ended up costing somewhere north of 200k in JavaScript). Since you can run your own Openfire server and you can download and use the instantFeeds plugin, feel free to rip the JavaScript off this site if you want to enable your users to sign up to receive your blog updates via IM.

J2EE, Open Source, Syndication, XML

ROME and wfw namespace elements

January 11, 2007 ajohnson Leave a comment

I created a ROME parser and generator for wfw:comment and wfw:commentRss elements today. You can read all about it here and download the source code here.

Not sure what the wfw:comment or wfw:commentRss elements are for? Imagine you’re reading my blog in desktop aggregator and you want to post a comment to my blog. The aggregator has no way of knowing where your comments should be HTTP posted too, so you have to open up another tab in Firefox and go to my blog to enter a comment. wfw:comment is an element that provides the HTTP post endpoint in the feed so that aggregators and widget developers can comment inline, just like this. wfw:commentRss is an element that provides you (or your favorite aggregator) with a link to the comments for the post you’re currently viewing.

J2EE, Open Source, Uncategorized, WebWork

OGNL: getter and setter types must match

January 4, 2007 ajohnson 3 Comments

Yesterday I ran into a interesting bug with the WebWork application I spend my waking hours working on, at least initially I thought it was a WebWork bug. I had a WebWork action with a getter / setter combination that looked like this:

public void setUser(String username) {
...
}
public User getUser() {
...
}

The thinking here was that if I invoked the action using a request like this:

/myaction.jspa?user=aaron

then the action would look up the user based on the given username and populate a User object, which would then be returned by the getUser() method (populating the user instance using a custom IOC interceptor would have also worked but would have been overkill in this particular case). The problem was that the setUser(String username) method was never getting called. After checking and double checking the method names and the parameter being passed, I googled around and found this email thread which discusses the very issue, saying that the problem was in fact not with WebWork, nor with it’s underlying expression engine OGNL, but rather the ‘Java reflection API semantics’. In the thread, Laurie Harper, a Struts committer, says that the:

… restriction (where getter and setter have to be of the same type) comes from the Java relection API semantics, not OGNL. A property can only have one type, so it makes sense that the getter and setter for a JavaBean property must agree on that type.

I didn’t doubt what he said, but I needed to see it with my own eyes, so I dove into the deep sea that is OGNL, reflection and JavaBeans. First I read the JavaBeans specification, which seems to back up his story: page 55 of the PDF says this:

By default, we use design patterns to locate properties by looking for methods of the form:
public <PropertyType> get<PropertyName>();
public void set<PropertyName>(<PropertyType> a);
If we discover a matching pair of â€œget<PropertyName>â€ and â€œset<PropertyName>â€ methods
that take and return the same type, then we regard these methods as defining a read-write property whose name will be â€œ<propertyName>â€.

So the JavaBeans specification requires the getter and setter to be of the same type, but how does OGNL figure out that the I’m trying to trick it with two different types?

I found the answer by digging pretty deep: the OgnlRuntime class looks up a PropertyDescriptor for the given class (in my case the WebWork action) and then calls the getWriteMethod() on the PropertyDescriptor instance. The documentation for that method mentions that it may return null if the property can’t be written but the documentation doesn’t mention why that might happen. If you grab the source for the PropertyDescriptor class, you’ll see that the getWriteMethod() method invokes a private method findPropertyType(), which is where I finally found the culprit. It compares the return type of the getter and the parameter type of the setter and throws an IntrospectionException if the types don’t match (line 602), which the getWriteMethod() catches and then returns null, which leaves the OgnlRuntime with no method to call.

So it sounds to me like this is really a JavaBean specification requirement, not a Java reflection API semantic, but then I guess it’s all just semantics anyway right?

J2EE, Open Source, Software Development, Syndication

ROME, custom modules, publishdate and RSS

November 8, 2006 ajohnson Leave a comment

At work, I’ve taken on the work of migrating our RSS feeds currently being produced using JSP to ROME. Since we’ve added a few custom elements to the feeds available in Jive Forums (things like message and thread counts), I’m taking advantage of the feature in ROME that gives you the ability to programtically define namespaces in your RSS 2.0, Atom 0.3 and Atom 1.0 feeds (examples: the iTunes module and the OpenSearch module). Anyway, the code I wrote to add an item to the list of available items in a feed looked something like this:

...
entry = new SyndEntryImpl();
entry.setTitle(thread.getSubject());
entry.setLink("http://mysite.com/community/threads.jspa?id=" + 
   thread.getID());
entry.setUpdatedDate(thread.getModificationDate());
entry.setPublishedDate(thread.getCreationDate());
...
JiveForumsModule module = new JiveForumsModuleImpl();
module.setReplyCount(thread.getReplyCount());
List modules = new ArrayList();
modules.add(module);
entry.setModules(modules);
...

This code works, but if you view the feed, you don’t get a publish date on the item. I dug into the ROME source code a bit and found that the publish date is stored as part of the Dublin Core module, which I came to find out is a ‘special’ module that always exists on a SyndEntryImpl object. Take a look at the implementation of the getModules() method on the SyndEntryImpl class:

public List getModules() {
  if  (_modules==null) {
    _modules=new ArrayList();
  }
  if (ModuleUtils.getModule(_modules,DCModule.URI)==null) {
    _modules.add(new DCModuleImpl());
  }
  return _modules;
}

See how the method automatically injects a DCModuleImpl into the _modules property if the DCModule doesn’t exist? Long story short, the code I wrote blew away the _modules property on the SyndEntryImpl instance which contained a single DCModule which itself contained the publishedDate date instance. So by the time the feed was produced, the publish date I set on each SyndEntry was long gone. I should have written my code like this:

JiveForumsModule module = new JiveForumsModuleImpl();
module.setReplyCount(thread.getReplyCount());
entry.getModules().add(module);

Better yet, the ROME team could have done two things:

Added documentation to the setModules(List modules) method that pointed out that any information in the existing DCModule instance will be lost if the provided list doesn’t contain the existing DCModule instance.
Added a method to the SyndEntry interface called addModule(Module module).

Open source: I’m lovin it.

J2EE, Open Source, Personal

JDJ: Mailets and Matchers

September 20, 2006 ajohnson Leave a comment

Remember back a couple months ago when I posted an article about using Apache James to implement VERP? Well, it actually did end up getting published in JDJ. If you subscribe to the magazine, you should have it in your hot little hands right now or you can visit the archives. If not, you’ll have to wait until next month when the article becomes available for free.

Content Management, J2EE, JavaScript, Open Source, Software Development, XML

JSON: Making Content Syndication easier

August 21, 2006 ajohnson 3 Comments

At work we’ve been having some discussions about sharing content between two websites: the natural first option was an XML solution, in this case RSS. Site A would subscribe to the RSS feeds of the site B, periodically retrieving the updated feeds, caching the contents of each feed for a specified period of time all the while displaying the resulting content on various parts of site A.

A couple months ago (December 2005 to be exact), Yahoo started supporting JSON (a lightweight data interchange format which stands for JavaScript Object Notation), as optional result format for some of it’s web services. The most common thing said about JSON is that it’s better than XML, usually meaning that it’s easier to parse and not as verbose, here’s a well written comparison of XML and JSON if you don’t believe me. While the comparisons of simplicity, openness and interoperability are useful, I think JSON really stands out when you’re working in a browser. Going back to the example I used above where site A needs to display content from site B, as I see it, this a sample runtime / flow that bits travel through in order to make the syndication work:
every_n_seconds() --> retrieve_feed() --> store_feed_entries() and then per request to site A:
make_page() --> get_feed_entries() --> parse_entries() --> display_entries(). There are a number of libraries built in Java for creating and parsing RSS, some for fetching RSS and you there’s even a JSP taglib for displaying RSS. But even with all the libraries, there’s still a good amount of code to write and a number of moving parts you’ll need to maintain. If you do the syndication on the client side using JSON, there are no moving parts. To display just the title of each one of my del.icio.us posts as an example, you would end up with something like this:
<script type="text/javascript" src="http://del.icio.us/feeds/json/ajohnson1200"></script> <script type="text/javascript"> for (var i=0, post; post = Delicious.posts[i]; i++) { document.write(post.d + '<br />'); } </script>
I’m comparing apples to oranges (server side RSS retrieval, storage, parse and display against client side JSON include) but there are a couple of non obvious advantages and disadvantages:

Caching: If used on a number of pages, syndicated JSON content can reduce the number of bits a browser has to download to fully render a page. For example, let’s say (for arguments sake) that we have an RSS feed that is 17k in size and a corresponding JSON feed of the same size (even though RSS would inevitably be bigger). Using the server side RSS syndication, the browser will have to download the rendered syndicated content (again let’s say it’s 17k). Using the JSON syndicated feed across a number of page views, the browser would download the 17k JSON feed once and then not again (assuming the server has been configured to send a 304) until the feed has a new item. Winner: JSON / client
Rendering: Of course, having the browser parse and render a 17K JSON feed wouldn’t be trivial. From a pure speed standpoint, the server could do the parse / generate once and then used an HTML rendering of the feed from cache from then on. Winner: RSS / server
Searching: Using JSON on the client, site A (which is syndicating content from site B), wouldn’t have any way of searching the content, outside of retrieving / parsing/ storing on the server. Also, spiders wouldn’t see the syndicated content from site B on site A unlike the server side RSS syndication where the syndicated content would look no different to a spider than the other content on site A. Winner: RSS / server
Ubiquity: JSON ‘only’ works if the browser has JavaScript enabled, which I’m guessing the large majority of users do have JavaScript enabled. But certain environments won’t and phones, set top boxes and anything else that runs in a browser but not on a PC may not have JavaScript, which means they won’t see the syndicated content. Server side generated content will be available across any platform that understands HTML. Winner: RSS / server

So wrapping up, when should you use JSON on the client and when should you use RSS on the server? If you need to syndicate a small amount of content to non programmers who can cut and paste (or programmers who are adept at JavaScript), JSON seems like the way to go. It’s trivial to get something up and running, the browser will cache the feed you create and your users will see the new content as soon as it becomes available in your JSON feed.

If you’ve read this far, you should go on and check out the examples on developer.yahoo.com and on del.icio.us. Also, if you’re a Java developer, you should head on over to sourceforge.net to take a look at the JSON-lib, which makes it wicked easy to create JSON from lists, arrays and beans.

Content Management, Interface Design, Open Source, WebWork

WebWork and meaningful URLs

August 1, 2006 ajohnson 2 Comments

Personal pet peeve: meaningful URLs (which tonight I found out go by many names: pretty URLs, RESTian URLs, SES URLs, hackable URLs, etc…). At work, we use WebWork extensively but up until this point we haven’t made an effort to create meaningful URL’s. As with any well designed framework, it turns out that there are a couple of ways you can create meaningful URL’s, with different levels of meaningfulness.

Version 2.2 of WebWork introduced the ActionMapper interface and a class called RestfulActionMapper, which gives you the ability to create URLs that might look something like this:
http://bookstore.com/books/category/java/keyword/webwork
instead of the more common:
http://bookstore.com/books.jspa?category=java&keyword=webwork
The nice thing about the RestfulActionMapper implementation is that you don’t have to write any code to parse the URL: you set up your WebWork actions with the appropriate setters and the RestfulActionMapper handles the rest. The downside is that this still isn’t really a truly hackable URL. For example, although this URL:
http://bookstore.com/books/category/java/keyword
and this URL:
http://bookstore.com/books/category
would probably work, they don’t really make sense. Why are ‘keyword’ and ‘category’ hanging around at the end? Both of the words are extra information required by the implementation that don’t add any value to the user.

The second way you can create meaningful URLs is by creating your own ActionMapper. You can get a good start by checking out the source code for the DefaultActionMapper and the RestfulActionMapper. To set properties on your action instances, you’ll want to create a HashMap,, add the appropriate properties from your URL to the map and then either create and return a new ActionMapping using the action name and map or call the setParams() method on an existing mapping. The end result is that you should be able to create and use meaningful URL that looks like this:
http://bookstore.com/books/java/webwork
Also of note:

User-Centered URL Design
URL as UI
What is interesting about LibraryLink
Representational State Transfer
Cool URIs don’t change
Updated 8/1/2006: Great quote by David Gelernter via Bill Dehora: “If you have three pet dogs, give them names. If you have 10000 head of cattle, don’t bother.”

J2EE, Jabber, Open Source, Personal, Software Development

Wildfire Enterprise launches..

June 30, 2006 ajohnson

For the last couple weeks I’ve been working on the ‘Deep, Real-time Reporting’ features of Wildfire Enterprise, which officially launched today (if you’ve ever thought about messing around with instant messaging / XMPP / Jabber, I’d highly recommend checking it out and that’s the last corporate pitch you’ll hear out of me). Herewith (as Tim Bray likes to say) are some notes about the tools we used to develop said features.

Prototype / script.aculo.us: Before I started working at Jive in April, I had done enough JavaScript to get by but my general attitude toward it was disdain. It seemed like it was always safer and easier to do things on the server. Prototype and script.aculo.us changed all that for me. If you haven’t worked with either: get started now. For whatever reason, the prototype website doesn’t contain a lick of documentation, so head on over to this site and this site once you’re ready to dive in. If nothing else, get rid of all your document.getElementById('thisAndthat') code and replace it with $('thisAndThat'), you’ll feel cleaner afterward. Also, make sure you get the latest version of prototype by downloading the latest version of Script.aculo.us because there’s a nasty bug with prototype 1.4 which I’ve written about before.
Which leads me to script.aculo.us. Go ahead, be amazed at the drag and drop shopping cart, the ‘sortable elements with AJAX callback’ and the sortable floats. Now, put all that useless stuff aside and look at the core library, called Visual Effects, which include the ability to do just about anything, but make it really easy to do things like slide elements in and out, show and hide elements, and highlight elements. You can see some of the cool effects in action by watching this movie our marketing team made ofthe dashboard in action.
DWR: I didn’t mention anything about AJAX in the previous two paragraphs because all of the AJAX work is being done by DWR, which takes away all the hassle of binding the objects that exist on your server to the logic on the client and back again. You don’t have to deal with JSON, XML or HTTP requests. On the server you create a class, expose one or more methods and point your DWR config file at the class. On the client, you include a couple DWR JavaScript files and write JavaScript to call methods on the server. It’s drop dead simple. I highly recommend trying it out if you’re using a Java servlet container as your platform. We used DWR in a couple different places, but notably on the dashboard to create a digg.com/spy-esque view of the conversations on the server.
JRobin: Before working on this project I had never seen JRobin or heard about RRD. In short, it’s a way of keeping statistics about your system over time without having to maintain giant log files or large databases. So if you ever find yourself in a situation where you want to keep track of oh, let’s say the number of database connections to a server, over a period of minutes, hours, days, weeks, months and years, take a look at JRobin. It’s fast, can record just about anything you want and the size of the log file(s) it generates will be the same on day one as it is on day 323.
JFreeChart: I’ve used JFreeChart on a number of projects in the past, but I used it more on this project than any other. It is not the easiest tool in the world to use. There are about 600 different knobs you can turn, which makes it tedious to make something look exactly like you want it too and the full documentation costs $40, but the product itself is free.
iText: Last but not least, we used iText to produce some pretty nifty looking reports in PDF format, which includ the graphs created by JFreeChart. I hadn’t ever used iText for anything more than messing around in ColdFusion back in the day, but it was still relatively easy to get something working quickly. It would be fantastic to be able to hand iText an HTML document and say ‘create a PDF’, but you can’t have everything in life.

And that’s the story! Send me an email if you have any questions about how we integrated the above tools, I’d be happy to help.

J2EE, Open Source, Software Development

Using Apache James and JavaMail to implement Variable Envelope Return Paths

June 9, 2006 ajohnson 7 Comments

I submitted a skeleton of the article that follows to JDJ, was told they would like to ‘commission’ it and then finally submitted it only to never hear anything back from them. So instead you get to read it here. Enjoy!

—————————–

Anyone who has spent any time working a application that sends emails has come across more than their fair share of bounced emails. If you actually read the bounced emails, you probably noticed that many of them either came from the ISP’s error handler (mailer-daemon@isp.com) or from an email address that wasn’t on your mailing list. The bounced message may not even have contained a copy of the original message. All of above scenarios make it very hard to figure out who the original message was sent to. Enter Daniel Bernstein, also known as djb, who in 1997, in response to this problem of matching email bounce messages to subscription addresses, wrote a paper describing a technique he called Variable Envelope Return Paths or VERP for short. In the paper, he describes process as:

…each recipient of the message sees a different envelope sender address. When a message to the djb-sos@silverton.berkeley.edu mailing list is sent to God@heaven.af.mil, for example, it has the following envelope sender:

djb-sos-owner-God=heaven.af.mil@silverton.berkeley.edu

If the message bounces, the bounce message will be sent back to djb-sos-owner-God=heaven.af.mil@silverton.berkeley.edu.

If God is forwarding His mail, the bounce message will still go to djb-sos-owner-God=heaven.af.mil@silverton.berkeley.edu. No matter how uninformative the bounce message is, it will display God’s subscription address in its envelope.

But you probably noticed that this article isn’t only about VERP: Apache James is a full featured SMTP, POP3 and NNTP server built using 100% Java and more importantly it has been designed from the ground up to be a mail application platform. The James mail application platform makes it a perfect candidate for handling bounced messages using VERP. Similarly, the JavaMail API is a framework for building mail and messaging applications using SMTP and POP3. JavaMail makes it easy to customize the envelope sender address, which means Java developers can utilize JavaMail on the client and James on the server to build email applications that enables VERP.

This article will describe an example VERP implementation, show how JavaMail can be used to modify the envelope sender address and will then illustrate how James can be used to recognize and process bounced email messages. It is not intended to be a in-depth look at either the Apache James mail server or the JavaMail API. If you’re interested in learning more about Apache James, a product review is available on the Sys-Con.com website (http://java.sys-con.com/read/38667.htm) and an extensive introduction to Apache James on IBM developerWorks (http://www-128.ibm.com/developerworks/library/j-james1.html). The JavaMail API is also reviewed on the Sys-Con.com website: http://java.sys-con.com/read/36545.htm.

VERP and JavaMail
Let’s start by looking at the email newsletter that a fictional store called ï¿½Javazon’ is sending to its’ customers. The developers at Javazon have been using the JavaMail API to successfully send the newsletter through their mail server using code similar to the example below.
String senderemail = "deals@javazon.com"; String toemail = "ajohnson@cephas.net"; Properties props = new Properties(); props.put("mail.smtp.host", mailserver); Session session = Session.getInstance(props, null); javax.mail.Message m = new MimeMessage(session); m.setFrom(new InternetAddress(senderemail)); m.setSubject("New Deals at Javazon!"); m.setRecipient(javax.mail.Message.RecipientType.TO, new InternetAddress(toemail)); m.setContent(content, "text/plain"); Transport.send(m);
The above code will produce an email message with headers that look like this:
Date: Wed, 26 Apr 2006 21:00:21 -0000 From: deals@javazon.com To: ajohnson@cephas.net Subject: New Deals at Javazon!
Because they want to good email citizens, the developers at Javazon use the POP3 functionality in JavaMail to retrieve the emails that bounce back to the address specified as ï¿½senderemail’ in the example above. Unfortunately, many of the bounce emails come from daemon accounts (instead of the recipient email address) which makes it difficult to figure out what email address the original message was sent to.

As mentioned at the start of this article, the only way to address the bounces that come from daemon accounts is to use VERP, which is a two part process. The first is relatively simple. An email message, according to the SMTP RFC-821 Section 2, is composed of two parts: an envelope which contains the SMTP source and destination addresses and the message, which consists of the headers and message body. To create a VERP capable email message, you need only modify the envelope, which is easily accomplished using the instance of java.util.Properties associated with the javax.mail.Session. Modifying the first example, the developers would end up with this:
String senderemail = "deals@javazon.com"; String toemail = "ajohnson@cephas.net"; String verpFrom = "deals-" + toemail.replaceAll("@", "=") + "@javazon.com"; Properties prop = new Properties(); props.put("mail.smtp.from", verpFrom); props.put("mail.smtp.host", mailserver); Session session = Session.getInstance(props, null); javax.mail.Message m = new MimeMessage(session); m.setFrom(new InternetAddress(senderemail)); m.setSubject("New Deals at Javazon!"); m.setRecipient(javax.mail.Message.RecipientType.TO, new InternetAddress(toemail)); m.setContent(content, "text/plain"); Transport.send(m);
When excecuted, the code above would create an email message with headers that look like this:
Return-Path: deals-ajohnson=cephas.net@javazon.com Date: Wed, 26 Apr 2006 21:00:21 -0000 From: deals@javazon.com To: ajohnson@cephas.net Subject: New Deals at Javazon!
Notice the different “Return Path:” header from the first email? If the email message bounces back to Javazon, it will go to the email address associated with the ‘Return Path’ header: “deals-ajohnson=cephas.net@javazon.com” rather than “deals@javazon.com”. This is where Apache James comes into the picture.

VERP and James
James can be configured and used like any other email server, but it’s real power comes from the ability it gives Java developers to plug right into the mail processing pipeline. James enables you to process email messages in a same way you might process HTTP requests that come into servlet container like Tomcat, but in a more flexible manner. If you want to preprocess (or postprocess) HTTP requests in Tomcat, you first create a class that implements the javax.servlet.Filter interface and then you create an entry in your web.xml that matches certain requests to that class. Your configuration might look something like this:
<filter> <filter-name>myfilter</filter-name> <filter-class>com.javazon.web.filters.GZipFilter</filter-class> </filter> <filter-mapping> <filter-name>myfilter</filter-name> <url-pattern>*.jsp</url-pattern> </filter-mapping>
The servlet container limits how you match requests to a filter: you are limited to pattern matching on the URL. Instead of a <filter> and <filter-mapping>, James gives you a <mailet>, which is made up of two parts: Matchers and Mailets. They are described on the James wiki:
“Matchers are configurable filters which filter mail from a processor pipeline into Mailets based upon fixed or dynamic criteria.

Mailets are classes which define an action to be performed. This can cover actions as diverse as local delivery, client side mail filtering, switch mail to a different processor pipeline, aliasing, archival, list serving, or gateways into external messaging systems.”

James ships with a number of Mailets and Matchers that you can use without writing a line of code, but the developers at Javazon will need to write their own Matcher and Mailet to handle the bounces generated from their email campaigns.

So the first thing the developers at Javazon are going to need to do is create a class that intercepts the bounces emails. A matcher class can be created in one of two ways: a) create a class that implements the org.apache.mailet.Matcher interface, or b) create a class that extends the org.apache.mailet.GenericMatcher class. Because GenericMatcher already implements both Matcher and MatcherConfig and because it provides simple version of the lifecycle methods, the path of least resistance is to extend GenericMatcher. The NewsletterMatcher class is going to ‘match’ only the recipients where the address of the recipient starts with the string “deals-“:
public class NewsletterMatcher extends GenericMatcher { public Collection match(Mail mail) throws MessagingException { Collection matches = new ArrayList(); Collection recipients = mail.getRecipients(); for (Iterator i=recipients.iterator(); i.hasNext();) { String recipient = (String)i.next(); if (recipient.startsWith("deals-")) { matches.add(recipient); } } return matches; } }

The NewsletterMatcher class, as you can see, returns a Collection of String objects, each presumably an email that has bounced. To do something with these matches, the developers will need to write a class that either implements the org.apache.mailet.Mailet interface or a class that extends the org.apache.mailet.GenericMailet class. Again, it will be simpler to extend the GenericMailet class:
public class NewsletterMailet extends GenericMailet { private static CustomerManager mgr = CustomerManager.getInstance(); public void service(Mail mail) throws MessagingException { Collection recipients = mail.getRecipients(); for (Iterator i=recipients.iterator(); i.hasNext();) { String recipient = (String)i.next(); if (recipient.startsWith("deals-")) { int atIndex = recipient.indexOf("@"); String rec = recipient.substring(0,atIndex) .replaceAll("=", "@") .replaceAll("deals-", ""); mgr.recordBounce(rec); mail.setState(Mail.GHOST); } } } }

In the above example, the NewsletterMailet class overrides the service() method in the GenericMailet class, loops over the list of recipients in the given email message and then checks to see if the recipient email address starts with the string “deals-“. If the recipient email address does start with “deals-“, then the class decodes the original recipient address by retrieving what is generally the username part of the email address, replacing the equals sign (=) with an @ sign and then replacing the “deals-” prefix. Then the Newsletter mailet class uses CustomerManager (a class that the Javazon developers use to manage customer information) to record the bounced email. If you were to step through the process, you’d see the recipient email address start as something like this:
deals-ajohnson=cephas.net@javazon.com
and then change to this:
deals-ajohnson=cephas.net
and finally to this:
ajohnson@cephas.net
The last step is to wire the mailet and matcher classes together in the Apache James configuration file, which is usually located here:
$JAMES/apps/james/SAR-INF/config.xml
You’ll need to make a number of entries. First, you’ll need to let James know where it should look for the mailet and matcher classes you’ve created by creating <mailetpackage> and <matcherpackage> entries inside the <mailetpackages> and <matcherpackages> elements:
<mailetpackages> ... <mailetpackage>com.javazon.mailets</mailetpackage> </mailetpackages> <matcherpackages> ... <matcherpackage>com.javazon.matchers</matcherpackage> </matcherpackages>
Then add references to the matcher and the mailet using a <mailet> element like this:
<mailet match="NewsletterMatcher" class="NewsletterMailet"> <processor>transport</processor> </mailet>
The match attribute of the mailet element specifies the name of the matcher class that should be instantiated when the matcher is invoked by the spool processor and the class attribute specifies the name of the mailet that you want invoked should the matcher class return any hits.

After adding these configuration entries and adding the compiled classes to the $JAMES/apps/james/SAR-INF/lib/ directory, restart the James process.

Testing
In order to test the configuration / application, you’ll need to have a James server configured and available via the internet via port 25 with a valid DNS name and a corresponding MX record. As an example. the system administrator at Javazon would configure a machine with James, make it available to the internet on port 25 and assign it a domain name like bounces.javazon.com. The developers could then send an invalid email using JavaMail to:
bounceme@javazon.com
(an account which probably doesn’t exist on the main javazon.com mail server) with a return path of :
deals-bounceme=javazon.com@bounces.javazon.com.
The bounce email will be sent to the server associated with the MX record for the domain name bounces.javazon.com, which should be the server the system administrator set up above. The NewsletterMatcher class should ‘match’ on the “deals-” prefix and then pass it to the NewsletterMailet, which should record the bounce using the CustomerManager instance.

Conclusion
After reading this article, you should hurry on over to the Apache James website, download the latest distribution and read the documentation. There are a number of other interesting ways you can improve your email processing by extending James using mailets and matchers.

References
————————————————————
JavaMail
· http://java.sun.com/products/javamail/
· http://jdj.sys-con.com/read/36545.htm
· http://www.ibm.com/developerworks/java/edu/j-dw-javamail-i.html

VERP
· http://cr.yp.to/proto/verp.txt

Apache James
· http://james.apache.org/
· http://jdj.sys-con.com/read/38667.htm
· http://www.ibm.com/developerworks/java/library/j-james1.html
· http://www-128.ibm.com/developerworks/java/library/j-jamess2.html
· http://james.apache.org/spoolmanager_configuration_2_1.html

J2EE, Lucene, Open Source, Software Development

Nutch, Yahoo!, and Hadoop

March 14, 2006 ajohnson 1 Comment

It’s been awhile since I mentioned anything about Lucene, my favorite Java based open source indexing and search library (which I built the karakoram spider / search application around). Doug Cutting, who created Lucene and who has spent the last couple years working on Nutch, was recently hired by Yahoo!. I just have a couple questions:

a) why would Yahoo want to hire a guy writing a Java based web crawler and indexer?

b) where does he get all the cool names? Nutch? Hadoop?

c) How cool does Hadoop sound? Hadoop Distributed Filesystem (HDFS) and an implementation of MapReduce. Hmm.. where else have I heard about those terms bantered about?

Aaron Johnson