Category Archives: Software Development

AtomicInteger

A couple weeks ago I came across some code that used an instance of an AtomicInteger, which is part of the java.util.concurrent.atomic package, which I believe the majority of which was written by Doug Lea, author of the excellent util.concurrent library. I wrote the class name down in my todo.txt thinking that it would be interesting to find out how AtomicInteger (and all the other atomic related classes) give you the ability to increment a shared integer in a multi-threaded environment, but I didn’t get around to it until this past weekend. If you’re not familiar with the atomic classes in JDK 5.0, the big takeaway is that they “…support lock-free thread-safe programming on single variables”. In code that means that if you have this:

public final class Counter {
  private long value = 0;
  public synchronized long getValue() {
    return value;
  }

  public synchronized long increment() {
    return ++value;
  }
}

you should be using something more like this:

public class NonblockingCounter {
    private AtomicInteger value;

    public int getValue() {
        return value.get();
    }

    public int increment() {
        int v;
        do {
            v = value.get();
        while (!value.compareAndSet(v, v + 1));
        return v + 1;
    }
}


Regardless of your familiarality, herewith are some notes on my research:

  • I downloaded the source code for JDK 5.0 and popped open the source for AtomicInteger, skipping down to the getAndIncrement method. The source for that method is curious, to say the least. It starts with a for loop that never ends, retrieves the current int value, adds one to that value and then fires the compareAndSet method, only returning if the compareAndSet method returns true. It took me a couple minutes to wrap my brain around this small block of code: no locking and what looks like an endless loop. Here it is if you don’t feel like downloading the code yourself:
        public final int getAndIncrement() {
            for (;;) {
                int current = get();
                int next = current + 1;
                if (compareAndSet(current, next))
                    return current;
            }
        }
    

    What’s remarkable about this block is that it seemingly ignores the fact that it was designed to be accessed simultaneously by multiple threads… and in fact that’s exactly how it was designed. If this method were a conversation between a person and a machine, it might go something like this:

    Person: Get me the current value!
    Machine: 1
    Person: Add one to that value!
    Machine: 2
    Person: If 1 is equal to the current value, then set the current value to 2, otherwise, let’s do it all over again!
    Machine: Try again.
    Person: WEEE! Let’s try again!

    I think the exclamation points are apt in this case because it turns out that the atomic classes use algorithms that are sometimes called ‘optimistic’ (which is probably not a trait that you ascribe to most programmers unless they’re talking about schedules) but are described on Wikipedia (and probably elsewhere) as ‘lock-free’ or ‘wait-free’ algorithms. They’re optimistic in the sense that they assume the best case:

    I’m sure that no other threads will be accessing the method when I’m using the method. Even if they do, I don’t care, I’ll just try again (and again and again)…

    Optimistic. The pessimistic version assumes the worst case:

    I’m sure some other thread will be trying to access the method when I’m using the method. I’ll have to block them from using it every time I invoke this method.

    If you check out the Wikipedia article I linked to above, you’ll see the implementation of these lock-free algorithms use atomic primitives, the notable one being the ‘compare and swap‘; the getAndIncrement method on the AtomicInteger class in Java has a corresponding compareAndSet method, which is backed by a class called Unsafe, which itself doesn’t have publicly available documentation. However, if you download the JDK 5.0 source code like I did and peek at it, you’ll see a method that looks like this:

    public final native boolean compareAndSwapInt(Object o, long offset, int expectedValue, int newValue);
    

    And that’s where the source code adventure stopped. If I’m understanding the method signature correctly, it means Sun has a native library (written in C?) for every OS that sets the value of the Object o at memory location ‘offset’ to value ‘newValue’ if the current value is equal to ‘expectedValue’.

  • I found a really really great article on IBM developerWorks by Brian Goetz1 called “Going atomic” (you’ll probably recognize the Counter and NonBlockingCounter examples I posted above). In it he does a great job describing compare and swap and the difference between a lock-free and wait-free implementation and then, just to show off, shows some benchmarking graphs that compare synchronization, ReentrantLock, fair Lock and AtomicLong on an 8-way Ultrasparc3 and a single-processor Pentium 4.

    You should also read his articles on developerWorks called More flexible, scalable locking in JDK and Introduction to nonblocking algorithms.

  • Dave Hale wrote an article that discusses the use of AtomicIntegers in a multi-threaded environment called “Parallel Loops with Atomic Integers“. He also pointed out an article published in the March 2005 edition of Dr. Dobb’s Journal called “The Free Lunch Is Over: A Fundamental Turn Toward Concurrency in Software“.

Hope you learned something today!

1: I saw Brian speak at the No Fluff Just Stuff Symposium in Boston last fall. He’s as good in person as he is on paper. Go hear him speak if you have a chance.

JSON: Making Content Syndication easier

At work we’ve been having some discussions about sharing content between two websites: the natural first option was an XML solution, in this case RSS. Site A would subscribe to the RSS feeds of the site B, periodically retrieving the updated feeds, caching the contents of each feed for a specified period of time all the while displaying the resulting content on various parts of site A.

A couple months ago (December 2005 to be exact), Yahoo started supporting JSON (a lightweight data interchange format which stands for JavaScript Object Notation), as optional result format for some of it’s web services. The most common thing said about JSON is that it’s better than XML, usually meaning that it’s easier to parse and not as verbose, here’s a well written comparison of XML and JSON if you don’t believe me. While the comparisons of simplicity, openness and interoperability are useful, I think JSON really stands out when you’re working in a browser. Going back to the example I used above where site A needs to display content from site B, as I see it, this a sample runtime / flow that bits travel through in order to make the syndication work:
every_n_seconds() --> retrieve_feed() --> store_feed_entries() and then per request to site A:
make_page() --> get_feed_entries() --> parse_entries() --> display_entries(). There are a number of libraries built in Java for creating and parsing RSS, some for fetching RSS and you there’s even a JSP taglib for displaying RSS. But even with all the libraries, there’s still a good amount of code to write and a number of moving parts you’ll need to maintain. If you do the syndication on the client side using JSON, there are no moving parts. To display just the title of each one of my del.icio.us posts as an example, you would end up with something like this:

<script type="text/javascript" src="http://del.icio.us/feeds/json/ajohnson1200"></script>
<script type="text/javascript">
for (var i=0, post; post = Delicious.posts[i]; i++) {
  document.write(post.d + '<br />');
}
</script>

I’m comparing apples to oranges (server side RSS retrieval, storage, parse and display against client side JSON include) but there are a couple of non obvious advantages and disadvantages:

  1. Caching: If used on a number of pages, syndicated JSON content can reduce the number of bits a browser has to download to fully render a page. For example, let’s say (for arguments sake) that we have an RSS feed that is 17k in size and a corresponding JSON feed of the same size (even though RSS would inevitably be bigger). Using the server side RSS syndication, the browser will have to download the rendered syndicated content (again let’s say it’s 17k). Using the JSON syndicated feed across a number of page views, the browser would download the 17k JSON feed once and then not again (assuming the server has been configured to send a 304) until the feed has a new item. Winner: JSON / client
  2. Rendering: Of course, having the browser parse and render a 17K JSON feed wouldn’t be trivial. From a pure speed standpoint, the server could do the parse / generate once and then used an HTML rendering of the feed from cache from then on. Winner: RSS / server
  3. Searching: Using JSON on the client, site A (which is syndicating content from site B), wouldn’t have any way of searching the content, outside of retrieving / parsing/ storing on the server. Also, spiders wouldn’t see the syndicated content from site B on site A unlike the server side RSS syndication where the syndicated content would look no different to a spider than the other content on site A. Winner: RSS / server
  4. Ubiquity: JSON ‘only’ works if the browser has JavaScript enabled, which I’m guessing the large majority of users do have JavaScript enabled. But certain environments won’t and phones, set top boxes and anything else that runs in a browser but not on a PC may not have JavaScript, which means they won’t see the syndicated content. Server side generated content will be available across any platform that understands HTML. Winner: RSS / server

So wrapping up, when should you use JSON on the client and when should you use RSS on the server? If you need to syndicate a small amount of content to non programmers who can cut and paste (or programmers who are adept at JavaScript), JSON seems like the way to go. It’s trivial to get something up and running, the browser will cache the feed you create and your users will see the new content as soon as it becomes available in your JSON feed.

If you’ve read this far, you should go on and check out the examples on developer.yahoo.com and on del.icio.us. Also, if you’re a Java developer, you should head on over to sourceforge.net to take a look at the JSON-lib, which makes it wicked easy to create JSON from lists, arrays and beans.

FluentInterface

A couple of weeks ago on the DWR users list, in the context of needing to wire up DWR without using an XML file, Joe Walker pointed to a blog posting by Martin Fowler. In it, Martin discusses an interface style called a ‘fluent interface’. It’s a little difficult to describe in words (so check it out in action on above mentioned blog post) but I think Piers Cawley described it best when he described the style as “…essentially interfaces that do a good job of removing hoopage.” Update: Geert Bevin uses this style in the RIFE framework and was calling the it “chainable builder methods” before Martin came along with the ‘fluent interface’ term.

Back to DWR.  I spent the last couple days working on a ‘fluent’ way of configuring DWR which obviously then wouldn’t require dwr.xml, the result of which is available here. In short, given an XML configuration file that looks like this:

<?xml version="1.0" encoding="UTF-8"?>
<!DOCTYPE dwr PUBLIC "-//GetAhead Limited//DTD Direct Web Remoting
1.0//EN" "http://www.getahead.ltd.uk/dwr/dwr10.dtd">
<dwr>
  <init>
    <converter id="testbean" class="uk.ltd.getahead.testdwr.TestBean2Converter"/>
  </init>
  <allow>
    <create creator="new" javascript="Test" scope="application">
      <param name="class" value="uk.ltd.getahead.testdwr.Test"/>
    </create>
    <create creator="new" javascript="JDate">
      <param name="class" value="java.util.Date"/>
      <exclude method="getHours"/>
      <auth method="getMinutes" role="admin"/>
      <auth method="getMinutes" role="devel"/>
    </create>
    <convert converter="bean" match="$Proxy*"/>
    <convert converter="testbean" match="uk.ltd.getahead.testdwr.TestBean"/>
    <convert converter="bean" match="uk.ltd.getahead.testdwr.ObjB"/>
    <convert converter="object" match="uk.ltd.getahead.testdwr.ObjA">
      <param name="force" value="true"/>
    </convert>
  </allow>
  <signatures>
  <![CDATA[
  import java.util.*;
  import uk.ltd.getahead.testdwr.*;
  Test.testBeanSetParam(Set<TestBean>);
  Test.testBeanListParam(List<TestBean>);
  Test.charTestBeanMapParam(Map<Character, TestBean>);
  Test.stringStringMapParam(Map<String, String>);
  Test.stringStringHashMapParam(HashMap<String, String>);
  Test.stringStringTreeMapParam(TreeMap<String, String>);
  Test.stringCollectionParam(Collection<String>);
  Test.stringListParam(List<String>);
  Test.stringLinkedListParam(LinkedList<String>);
  Test.stringArrayListParam(ArrayList<String>);
  Test.stringSetParam(Set<String>);
  Test.stringHashSetParam(HashSet<String>);
  Test.stringTreeSetParam(TreeSet<String>);
  ]]>
  </signatures>
</dwr>

you can instead configure DWR using the FluentConfiguration class like this:

FluentConfiguration fluentconfig = (FluentConfiguration)configuration;
fluentconfig
  .withConverterType("testbean", "uk.ltd.getahead.testdwr.TestBean2Converter")
  .withCreator("new", "Test")
    .addParam("scope", "application")
    .addParam("class", "uk.ltd.getahead.testdwr.Test")
  .withCreator("new", "JDate")
    .addParam("class", "java.util.Date")
    .exclude("getHours")
    .withAuth("getMinutes", "admin")
    .withAuth("getMinutes", "devel")
  .withConverter("bean", "$Proxy*")
  .withConverter("testbean", "uk.ltd.getahead.testdwr.TestBean")
  .withConverter("bean", "uk.ltd.getahead.testdwr.ObjB")
  .withConverter("object", "uk.ltd.getahead.testdwr.ObjA")
    .addParam("force", "true")
  .withSignature()
    .addLine("import java.util.*;")
    .addLine("import uk.ltd.getahead.testdwr.*;")
    .addLine("Test.testBeanSetParam(Set);")
    .addLine("Test.testBeanListParam(List);")
    .addLine("Test.charTestBeanMapParam(Map);")
    .addLine("Test.stringStringMapParam(Map);")
    .addLine("Test.stringStringHashMapParam(HashMap);")
    .addLine("Test.stringStringTreeMapParam(TreeMap);")
    .addLine("Test.stringCollectionParam(Collection);")
    .addLine("Test.stringListParam(List);")
    .addLine("Test.stringLinkedListParam(LinkedList);")
    .addLine("Test.stringArrayListParam(ArrayList);")
    .addLine("Test.stringSetParam(Set);")
    .addLine("Test.stringHashSetParam(HashSet);")
    .addLine("Test.stringTreeSetParam(TreeSet);")
  .finished();

If you’re interested in using this in your DWR project, you need only to:

  • create a class that extends DWRServlet (example: check out FluentDWRServlet.java in the zip file) and use that class as your DWR servlet
  • add a configuration param in web.xml called uk.ltd.getahead.dwr.Configuration and set the value to net.cephas.dwr.FluentConfiguration
  • add a configuration param in web.xml called skipDefaultConfig and set the value to true

    <servlet>
      <servlet-name>dwr</servlet-name>
      <servlet-class>net.cephas.dwr.FluentDWRServlet</servlet-class>
        <init-param>
          <param-name>uk.ltd.getahead.dwr.Configuration</param-name>
          <param-value>net.cephas.dwr.FluentConfiguration</param-value>
        </init-param>
        <init-param>
          <param-name>skipDefaultConfig</param-name>
          <param-value>true</param-value>
        </init-param>
    </servlet>

  • and then override the configure method in the servlet and use the fluent style of configuration I used above.

    Send me an email if you have any questions!

  • Wildfire Enterprise launches..

    For the last couple weeks I’ve been working on the ‘Deep, Real-time Reporting’ features of Wildfire Enterprise, which officially launched today (if you’ve ever thought about messing around with instant messaging / XMPP / Jabber, I’d highly recommend checking it out and that’s the last corporate pitch you’ll hear out of me). Herewith (as Tim Bray likes to say) are some notes about the tools we used to develop said features.

    • Prototype / script.aculo.us: Before I started working at Jive in April, I had done enough JavaScript to get by but my general attitude toward it was disdain. It seemed like it was always safer and easier to do things on the server. Prototype and script.aculo.us changed all that for me. If you haven’t worked with either: get started now. For whatever reason, the prototype website doesn’t contain a lick of documentation, so head on over to this site and this site once you’re ready to dive in. If nothing else, get rid of all your document.getElementById('thisAndthat') code and replace it with $('thisAndThat'), you’ll feel cleaner afterward. Also, make sure you get the latest version of prototype by downloading the latest version of Script.aculo.us because there’s a nasty bug with prototype 1.4 which I’ve written about before.

      Which leads me to script.aculo.us. Go ahead, be amazed at the drag and drop shopping cart, the ‘sortable elements with AJAX callback’ and the sortable floats. Now, put all that useless stuff aside and look at the core library, called Visual Effects, which include the ability to do just about anything, but make it really easy to do things like slide elements in and out, show and hide elements, and highlight elements. You can see some of the cool effects in action by watching this movie our marketing team made ofthe dashboard in action.

    • DWR: I didn’t mention anything about AJAX in the previous two paragraphs because all of the AJAX work is being done by DWR, which takes away all the hassle of binding the objects that exist on your server to the logic on the client and back again. You don’t have to deal with JSON, XML or HTTP requests. On the server you create a class, expose one or more methods and point your DWR config file at the class. On the client, you include a couple DWR JavaScript files and write JavaScript to call methods on the server. It’s drop dead simple. I highly recommend trying it out if you’re using a Java servlet container as your platform. We used DWR in a couple different places, but notably on the dashboard to create a digg.com/spy-esque view of the conversations on the server.
    • JRobin: Before working on this project I had never seen JRobin or heard about RRD. In short, it’s a way of keeping statistics about your system over time without having to maintain giant log files or large databases. So if you ever find yourself in a situation where you want to keep track of oh, let’s say the number of database connections to a server, over a period of minutes, hours, days, weeks, months and years, take a look at JRobin. It’s fast, can record just about anything you want and the size of the log file(s) it generates will be the same on day one as it is on day 323.
    • JFreeChart: I’ve used JFreeChart on a number of projects in the past, but I used it more on this project than any other. It is not the easiest tool in the world to use. There are about 600 different knobs you can turn, which makes it tedious to make something look exactly like you want it too and the full documentation costs $40, but the product itself is free.
    • iText: Last but not least, we used iText to produce some pretty nifty looking reports in PDF format, which includ the graphs created by JFreeChart. I hadn’t ever used iText for anything more than messing around in ColdFusion back in the day, but it was still relatively easy to get something working quickly. It would be fantastic to be able to hand iText an HTML document and say ‘create a PDF’, but you can’t have everything in life.

    And that’s the story! Send me an email if you have any questions about how we integrated the above tools, I’d be happy to help.

    Using Apache James and JavaMail to implement Variable Envelope Return Paths

    I submitted a skeleton of the article that follows to JDJ, was told they would like to ‘commission’ it and then finally submitted it only to never hear anything back from them. So instead you get to read it here. Enjoy!

    —————————–

    Anyone who has spent any time working a application that sends emails has come across more than their fair share of bounced emails. If you actually read the bounced emails, you probably noticed that many of them either came from the ISP’s error handler (mailer-daemon@isp.com) or from an email address that wasn’t on your mailing list. The bounced message may not even have contained a copy of the original message. All of above scenarios make it very hard to figure out who the original message was sent to. Enter Daniel Bernstein, also known as djb, who in 1997, in response to this problem of matching email bounce messages to subscription addresses, wrote a paper describing a technique he called Variable Envelope Return Paths or VERP for short. In the paper, he describes process as:

    …each recipient of the message sees a different envelope sender address. When a message to the djb-sos@silverton.berkeley.edu mailing list is sent to God@heaven.af.mil, for example, it has the following envelope sender:

    djb-sos-owner-God=heaven.af.mil@silverton.berkeley.edu

    If the message bounces, the bounce message will be sent back to djb-sos-owner-God=heaven.af.mil@silverton.berkeley.edu.

    If God is forwarding His mail, the bounce message will still go to djb-sos-owner-God=heaven.af.mil@silverton.berkeley.edu. No matter how uninformative the bounce message is, it will display God’s subscription address in its envelope.

    But you probably noticed that this article isn’t only about VERP: Apache James is a full featured SMTP, POP3 and NNTP server built using 100% Java and more importantly it has been designed from the ground up to be a mail application platform. The James mail application platform makes it a perfect candidate for handling bounced messages using VERP. Similarly, the JavaMail API is a framework for building mail and messaging applications using SMTP and POP3. JavaMail makes it easy to customize the envelope sender address, which means Java developers can utilize JavaMail on the client and James on the server to build email applications that enables VERP.

    This article will describe an example VERP implementation, show how JavaMail can be used to modify the envelope sender address and will then illustrate how James can be used to recognize and process bounced email messages. It is not intended to be a in-depth look at either the Apache James mail server or the JavaMail API. If you’re interested in learning more about Apache James, a product review is available on the Sys-Con.com website (http://java.sys-con.com/read/38667.htm) and an extensive introduction to Apache James on IBM developerWorks (http://www-128.ibm.com/developerworks/library/j-james1.html). The JavaMail API is also reviewed on the Sys-Con.com website: http://java.sys-con.com/read/36545.htm.

    VERP and JavaMail
    Let’s start by looking at the email newsletter that a fictional store called �Javazon’ is sending to its’ customers. The developers at Javazon have been using the JavaMail API to successfully send the newsletter through their mail server using code similar to the example below.

    String senderemail = "deals@javazon.com";
    String toemail = "ajohnson@cephas.net";
    Properties props = new Properties();
    props.put("mail.smtp.host", mailserver);
    Session session = Session.getInstance(props, null);
    javax.mail.Message m = new MimeMessage(session);
    m.setFrom(new InternetAddress(senderemail));
    m.setSubject("New Deals at Javazon!");
    m.setRecipient(javax.mail.Message.RecipientType.TO,
    new InternetAddress(toemail));
    m.setContent(content, "text/plain");
    Transport.send(m);

    The above code will produce an email message with headers that look like this:

    Date: Wed, 26 Apr 2006 21:00:21 -0000
    From: deals@javazon.com
    To: ajohnson@cephas.net
    Subject: New Deals at Javazon!

    Because they want to good email citizens, the developers at Javazon use the POP3 functionality in JavaMail to retrieve the emails that bounce back to the address specified as �senderemail’ in the example above. Unfortunately, many of the bounce emails come from daemon accounts (instead of the recipient email address) which makes it difficult to figure out what email address the original message was sent to.

    As mentioned at the start of this article, the only way to address the bounces that come from daemon accounts is to use VERP, which is a two part process. The first is relatively simple. An email message, according to the SMTP RFC-821 Section 2, is composed of two parts: an envelope which contains the SMTP source and destination addresses and the message, which consists of the headers and message body. To create a VERP capable email message, you need only modify the envelope, which is easily accomplished using the instance of java.util.Properties associated with the javax.mail.Session. Modifying the first example, the developers would end up with this:

    String senderemail = "deals@javazon.com";
    String toemail = "ajohnson@cephas.net";
    String verpFrom = "deals-" + toemail.replaceAll("@", "=") + "@javazon.com";
    Properties prop = new Properties();
    props.put("mail.smtp.from", verpFrom);
    props.put("mail.smtp.host", mailserver);
    Session session = Session.getInstance(props, null);
      javax.mail.Message m = new MimeMessage(session);
    m.setFrom(new InternetAddress(senderemail));
    m.setSubject("New Deals at Javazon!");
    m.setRecipient(javax.mail.Message.RecipientType.TO,
    new InternetAddress(toemail));
    m.setContent(content, "text/plain");
    Transport.send(m);

    When excecuted, the code above would create an email message with headers that look like this:

    Return-Path: deals-ajohnson=cephas.net@javazon.com
    Date: Wed, 26 Apr 2006 21:00:21 -0000
    From: deals@javazon.com
    To: ajohnson@cephas.net
    Subject: New Deals at Javazon!

    Notice the different “Return Path:” header from the first email? If the email message bounces back to Javazon, it will go to the email address associated with the ‘Return Path’ header: “deals-ajohnson=cephas.net@javazon.com” rather than “deals@javazon.com”. This is where Apache James comes into the picture.

    VERP and James
    James can be configured and used like any other email server, but it’s real power comes from the ability it gives Java developers to plug right into the mail processing pipeline. James enables you to process email messages in a same way you might process HTTP requests that come into servlet container like Tomcat, but in a more flexible manner. If you want to preprocess (or postprocess) HTTP requests in Tomcat, you first create a class that implements the javax.servlet.Filter interface and then you create an entry in your web.xml that matches certain requests to that class. Your configuration might look something like this:

    <filter>
      <filter-name>myfilter</filter-name>
      <filter-class>com.javazon.web.filters.GZipFilter</filter-class>
    </filter>
    <filter-mapping>
      <filter-name>myfilter</filter-name>
      <url-pattern>*.jsp</url-pattern>
    </filter-mapping>

    The servlet container limits how you match requests to a filter: you are limited to pattern matching on the URL. Instead of a <filter> and <filter-mapping>, James gives you a <mailet>, which is made up of two parts: Matchers and Mailets. They are described on the James wiki:
    “Matchers are configurable filters which filter mail from a processor pipeline into Mailets based upon fixed or dynamic criteria.

    Mailets are classes which define an action to be performed. This can cover actions as diverse as local delivery, client side mail filtering, switch mail to a different processor pipeline, aliasing, archival, list serving, or gateways into external messaging systems.”

    James ships with a number of Mailets and Matchers that you can use without writing a line of code, but the developers at Javazon will need to write their own Matcher and Mailet to handle the bounces generated from their email campaigns.

    So the first thing the developers at Javazon are going to need to do is create a class that intercepts the bounces emails. A matcher class can be created in one of two ways: a) create a class that implements the org.apache.mailet.Matcher interface, or b) create a class that extends the org.apache.mailet.GenericMatcher class. Because GenericMatcher already implements both Matcher and MatcherConfig and because it provides simple version of the lifecycle methods, the path of least resistance is to extend GenericMatcher. The NewsletterMatcher class is going to ‘match’ only the recipients where the address of the recipient starts with the string “deals-“:

    public class NewsletterMatcher extends GenericMatcher {
      public Collection match(Mail mail) throws MessagingException {
        Collection matches = new ArrayList();
        Collection recipients = mail.getRecipients();
        for (Iterator i=recipients.iterator(); i.hasNext();) {
          String recipient = (String)i.next();
          if (recipient.startsWith("deals-")) {
           matches.add(recipient);
          }
        }
        return matches;
      }
    }

    The NewsletterMatcher class, as you can see, returns a Collection of String objects, each presumably an email that has bounced. To do something with these matches, the developers will need to write a class that either implements the org.apache.mailet.Mailet interface or a class that extends the org.apache.mailet.GenericMailet class. Again, it will be simpler to extend the GenericMailet class:

    public class NewsletterMailet extends GenericMailet {
      private static CustomerManager mgr = CustomerManager.getInstance();
      public void service(Mail mail) throws MessagingException {
        Collection recipients = mail.getRecipients();
        for (Iterator i=recipients.iterator(); i.hasNext();) {
            String recipient = (String)i.next();
            if (recipient.startsWith("deals-")) {
            int atIndex = recipient.indexOf("@");
            String rec = recipient.substring(0,atIndex)
            .replaceAll("=", "@")
            .replaceAll("deals-", "");
            mgr.recordBounce(rec);
            mail.setState(Mail.GHOST);
          }
        }
      }
    }

    In the above example, the NewsletterMailet class overrides the service() method in the GenericMailet class, loops over the list of recipients in the given email message and then checks to see if the recipient email address starts with the string “deals-“. If the recipient email address does start with “deals-“, then the class decodes the original recipient address by retrieving what is generally the username part of the email address, replacing the equals sign (=) with an @ sign and then replacing the “deals-” prefix. Then the Newsletter mailet class uses CustomerManager (a class that the Javazon developers use to manage customer information) to record the bounced email. If you were to step through the process, you’d see the recipient email address start as something like this:

    deals-ajohnson=cephas.net@javazon.com

    and then change to this:

    deals-ajohnson=cephas.net

    and finally to this:

    ajohnson@cephas.net

    The last step is to wire the mailet and matcher classes together in the Apache James configuration file, which is usually located here:

    $JAMES/apps/james/SAR-INF/config.xml

    You’ll need to make a number of entries. First, you’ll need to let James know where it should look for the mailet and matcher classes you’ve created by creating <mailetpackage> and <matcherpackage> entries inside the <mailetpackages> and <matcherpackages> elements:

    <mailetpackages>
      ...
      <mailetpackage>com.javazon.mailets</mailetpackage>
    </mailetpackages>
    <matcherpackages>
      ...
      <matcherpackage>com.javazon.matchers</matcherpackage>
    </matcherpackages>

    Then add references to the matcher and the mailet using a <mailet> element like this:

    <mailet match="NewsletterMatcher" class="NewsletterMailet">
      <processor>transport</processor>
    </mailet>

    The match attribute of the mailet element specifies the name of the matcher class that should be instantiated when the matcher is invoked by the spool processor and the class attribute specifies the name of the mailet that you want invoked should the matcher class return any hits.

    After adding these configuration entries and adding the compiled classes to the $JAMES/apps/james/SAR-INF/lib/ directory, restart the James process.

    Testing
    In order to test the configuration / application, you’ll need to have a James server configured and available via the internet via port 25 with a valid DNS name and a corresponding MX record. As an example. the system administrator at Javazon would configure a machine with James, make it available to the internet on port 25 and assign it a domain name like bounces.javazon.com. The developers could then send an invalid email using JavaMail to:

    bounceme@javazon.com

    (an account which probably doesn’t exist on the main javazon.com mail server) with a return path of :

    deals-bounceme=javazon.com@bounces.javazon.com.

    The bounce email will be sent to the server associated with the MX record for the domain name bounces.javazon.com, which should be the server the system administrator set up above. The NewsletterMatcher class should ‘match’ on the “deals-” prefix and then pass it to the NewsletterMailet, which should record the bounce using the CustomerManager instance.

    Conclusion
    After reading this article, you should hurry on over to the Apache James website, download the latest distribution and read the documentation. There are a number of other interesting ways you can improve your email processing by extending James using mailets and matchers.

    References
    ————————————————————
    JavaMail
    · http://java.sun.com/products/javamail/
    · http://jdj.sys-con.com/read/36545.htm
    · http://www.ibm.com/developerworks/java/edu/j-dw-javamail-i.html

    VERP
    · http://cr.yp.to/proto/verp.txt

    Apache James
    · http://james.apache.org/
    · http://jdj.sys-con.com/read/38667.htm
    · http://www.ibm.com/developerworks/java/library/j-james1.html
    · http://www-128.ibm.com/developerworks/java/library/j-jamess2.html
    · http://james.apache.org/spoolmanager_configuration_2_1.html

    Prototype, tinyMCE, ‘too much recursion’

    If you happen to be using TinyMCE and the Prototype library together and you’re getting a ‘too much recursion’ error in FireFox, make sure to upgrade to the latest version of Prototype, which at the time of this post is version 1.5.0. Near as I can tell, the only way to get version 1.5.0 is to either a) check it out of the subversion repository or b) download script.aculo.us, which contains version 1.5.0 in the lib directory of the distribution.

    I think the ‘too much recursion’ Prototype / TinyMCE problem happens because Prototype stores a copy of the Array reverse function in a property called _reverse:

    Array.prototype._reverse = Array.prototype.reverse;

    and then redefines the reverse function, adding a argument ‘inline’:

    reverse: function(inline) {
      return (inline !== false ? this : this.toArray())._reverse();
    }

    Somehow (and I’m not sure how this would happen) the copy happens again, which means that _reverse() would point to the redefined reverse() method function, which of course points to _reverse(), which leads to a infinite loop. Hence, ‘too much recursion’.

    Regardless, the changelog from latest version of Prototype has a pointer to issue #3951 (but it doesn’t appear that the issues are public): in short, version 1.5.0 of Prototype does a check to make sure that _reverse() hasn’t been defined:

    if (!Array.prototype._reverse)
      Array.prototype._reverse = Array.prototype.reverse;

    In related news, if you’re using script.aculo.us with TinyMCE, make sure to embed the script.aculo.us js after the TinyMCE js.

    Fiddler, .NET and SOAP

    This morning I spent some time playing with WSE, .NET and Java. I knew Fiddler could listen in on conversations between IE and the world, but I never was able to get it to listen to conversations between me and well, me (ie: localhost). Turns out, according to the well named help page, that

    “….NET will always bypass the Fiddler proxy for URLs containing localhost. So, rather than using localhost, change your code to refer to the machine name.”

    Easy.

    Nutch, Yahoo!, and Hadoop

    It’s been awhile since I mentioned anything about Lucene, my favorite Java based open source indexing and search library (which I built the karakoram spider / search application around). Doug Cutting, who created Lucene and who has spent the last couple years working on Nutch, was recently hired by Yahoo!. I just have a couple questions:

    a) why would Yahoo want to hire a guy writing a Java based web crawler and indexer?

    b) where does he get all the cool names? Nutch? Hadoop?

    c) How cool does Hadoop sound? Hadoop Distributed Filesystem (HDFS) and an implementation of MapReduce. Hmm.. where else have I heard about those terms bantered about?

    Email to Instant Message Gateway

    It’s wicked cold this weekend in Massachusetts so I stayed inside and had some fun with Apache James and Smack (an open source library for communicating with XMPP servers from Jive Software). See, I was looking over this page of Mailet ideas on the Apache James wiki and I saw two different people wanting instant message notification of email delivery. Having spent some time writing a custom mailet or two recently and having been recently reminded of the very cool Smack library, it turned out to be relatively easy to snap the two together to make an email to instant messaging gateway. I don’t want to bore you with the details, so I posted a full description, including installation / configuration instructions, source code and binaries over on my projects page. Enjoy!

    The cost of Calendar object creation

    I spent some quality time with Joshua Bloch this week and by that I mean his book Effective Java. In item number four (the book is organized as a series of fifty items, a format he borrowed from the Effective C++ book by Scott Meyers) he discusses the practice of reusing a single object (instead of creating duplicate objects every time) and uses a Calendar class as an example of an object that you may want to reuse. Near the end of the item, he says “… Calendar instances are particularly expensive to create”, which I’ve heard before, but what does it really mean? Why is it expensive? The JavaDoc for the Calendar class doesn’t attempt to explain why creating instances might be expensive, so I tested it out, just to see how expensive a Calendar object was to create:

    long start = System.currentTimeMillis();
    for (int i=0; i
    and then compared that to the creation of 100,000 Date objects:

    long start = System.currentTimeMillis();
    for (int i=0; i
    On average on my system, the creation of 100,000 Date objects is anywhere from 10 to 30x's as fast as the creation of 100,000 Calendar objects. So while I never doubted Josh, it's obvious he's right. But that doesn't answer the question, why? The only place to go is the Java souce code, which you can download from sun.com. I'll save you the 48MB download and loosely unwrap the creation of a single Calendar object:

    // Calendar c = Calendar.getInstance();
    * createCalendar(TimeZone.getDefault(), Locale.getDefault());
      * new GregorianCalendar(zone, aLocale)
        * super(zone, aLocale);
          * setTimeInMillis(System.currentTimeMillis());
            * computeFields();
              * computeFieldsImpl()
                * timeToFields
                  * internalSet x 6
                * internalSet x 8

    and a single Date object:

    // Date d = new Date();
    * this(System.currentTimeMillis());
      * long fastTime = date;

    So now it should be pretty obvious why it a Calendar object is "... particularly expensive to create." The Calender getInstance() method gets the current time in milliseconds and then unzips that value into the current year, month, date, day of the week, day of the year, milliseconds, seconds, minutes, hour of the day, AM / PM, hour, time zone and daylight savings time offset. The Date object? The only thing it does is store the value of the difference, measured in milliseconds, between the current time and midnight, January 1, 1970 UTC in a member variable.

    Tune in next week for more exciting object creation action.