All posts by ajohnson

JAXB

Thursday, developer meeting day at MINDSEYE. We all put our feet up on the conference table, sip on a Mike’s or a Guinness and ‘interface’. Today Maia did an impromptu presentation on JAXB, a Java technology from Sun “…automates the mapping between XML documents and Java objects.” Sounds pretty boring doesn’t it? Stop reading then.

So I’ve written a couple apps that use Java and alot of applications that use XML. IMNSHO, the most tedious programming I’ve ever done is the work I’ve done parsing, validating, and hacking at XML. XML is a boon to developers, but who really wants to write ‘lElements.item(0).text’ over and over again? Not many people I know. Anyway, JAXB. JAXB takes the tedium out of using XML and for that reason alone is a great tool. In short, you use JAXB to:

  • unmarshal XML content into a Java representation
  • access, update and validate the Java representation against schema constraint
  • marshal the Java representation of the XML content into XML content

In non-geek, that means that you can hand an XML document to a Java system that uses JAXB, the developer writes about 3 lines of code to transform that XML document into a Java object, the developer can then pass that Java object around in his or her system, and then at some point in time easily transform that Java object back into XML. No messing with childNode() or GetElementsByTagName(). Beautiful. But it gets better. JAXB creates Java objects for you that represent the XML document you pass in. These Java objects are created by JAXB with accessor methods so that you can modify the contents of the XML document…without knowing any XML syntax. So if you had an xml document that looked like this:

<?xml version=”1.0″?>
<application>
   <caching bCache=”false” objectttl=”0″>
</application>

you’d get a Java class called ‘caching’ with getters and setters for the bCache and objectttl properties. You could pass in the above XML document and modify the settings in 5 lines of code (pseudo code, not tested or compiled, use at your own risk):

JAXBContext jc = JAXBContext.newInstance( “primer.po” );
Unmarshaller u = jc.createUnmarshaller();
caching c = (caching)u.unmarshal( new FileInputStream( “config.xml” ) );
c.setBCache=”true”;
c.setObjectttl=”36000″;

not so bad is it? For the sake of getting you hooked, I neglected to mention the fact that for every type of XML document that you want to use, you first have to create an XML Schema Document, but hey, you’re lazy right? That’s what makes you a good programmer.

Related links:

Binding XML Schema to Java Classes with JAXB: java.sun.com tutorial
JAXB FAQ: [link]
The JAXB API: [xml.com article]
JAXB Mailing List [java.sun.com]
Developing with JAXB and Ant: [onjava.com]
Generate XML Mapping Code with JAXB: [devx.com]
Brett McLaughlin on JAXB: [newinstance.com], note: adding Brett to blogroll. Brett wrote Java & XML: Solutions to Real-World Problems, Building Java Enterprise Applications Vol. II: Web Applications, and Java and XML Data Binding.

Emerging Technology: Who Loves Ya, Baby?

Steven Johnson’s latest article is online @ discover.com: Emerging Technology: Who Loves Ya, Baby?

His thesis is based on Cat’s Cradle, which “… explains how the world is divided into two types of social organizations: the karass and the granfalloon.” I was thinking on the way home tonight how the exposure of massive amounts structured data via xml (web services, rss, xml-rpc…) will make it exponentially easier to create applications like the InFlow software mentioned in the article. Allconsuming.net is one great example of that. Erik Benson gets information from weblogs.com, pings amazon’s web service and voila, we have a social book circle. While I’m sure the availability of this information worries privacy advocates, you can’t help be get excited about the possibilities that lie ahead of us.

Cronlog

IIS makes it really easy to maintain your log files… in fact, you don’t have to do anything tricky besides checking the appropriate checkbox. Apache on the other hand is a bit more complicated… I did a couple googles tonight looking for the perfect, easy 5 minute solution. This is what I found. Download the source, unzip, run ./configure, run make, run make install and then change your log settings within the appropriate context of your httpd.conf (I put the following directive within each virtual host definition) to something like this:

CustomLog “|/usr/local/sbin/cronolog /etc/httpd/logs/yoursite.com/%Y%m.log” combined

Save httpd.conf and restart Apache.

That’ll create a log file that looks like this:

/etc/httpd/logs/yoursite.com/200303.log

Pretty handy.

Mintz Levin dot com launch

So after about 12 hours of load testing and final preparations, I’m proud to say that the 2003 version of the Mintz Levin site is now up and running. The site uses web services for publishing information from the staging site to the live site, isapi_rewrite to provide search engine safe urls and is probably the latest revision available of the MINDSEYE CMS framework we’ve been developing internally for the last couple months.

If you read my previous posts about full text searching and the speed (or lack thereof) of Verity, you’ll understand why today we actually ripped out the Verity searching and replaced it with the Full Text Searching engine available in SQL Server. A couple things lead to that decision.

First, I couldn’t get anything faster than 250ms per Verity search on a (admittedly slow) single processor 500mz Pentium III. Since the site wide search page allows you to search up to 9 collections (and filter down within those sections for a more fine tuned search), the search page was taking upwards of 2 seconds to execute, which caused the machine get overwhelmed when doing anything more than a couple simultaneous users. Performance monitor would show the processor pegged at 100%, cfstat would show requests being queued and the average request time rapidly growing. After replacing the engine, we got the request time down to (in some cases) 16ms and cf didn’t spike the processor. In fact, we loaded the box up to 150 concurrent users (using Apache JMeter which is a great tool for doing cheap load testing, check it out sometime) with no issues (mind you, it’s a single processor 500mhz) . Not bad.

Second, SQL Server Full Text searching gave us more flexibility in searching; if I wanted to order the searches by datecreated or by something other than label or relevance, we simply added an ‘order by’ to the sql query.

Finally, because the SQL Server Full Text engine runs within a cfquery, you can add the cachedwithin attribute to the query to get the results directly into RAM. Verity doesn’t have a cachedwithin attribute, the only way to cache the results would be to stick the resulting query into application or server scope, which by itself isn’t all that bad, but just requires more coding than simply adding an attribute to a tag.

The only time I might use Verity (ie: where SQL Server Full Text Searching would not work) would be where I needed to index the file system (ie: html files, pdf files, word files) or where I needed to spider a site. However, even here, if you have the time to write extra code, it might make sense to use a freely available spider or text extraction tool to read documents on the file system system or spider the site and then store the resulting information in SQL Server as text. SQL Server Full Text Searching is obviously tied to a MS/Windows environment but I haven’t yet looked at how MySQL handles full text searching, although I know that Maia has done some work with it on nehra.com. I’d be interested in hearing about anyone elses’s experience with full text searching using MS SQL, MySQL or Verity. If you’re reading this post and are having issues with Verity and you don’t have access to MySQL or SQL Server with full text indexing, you might look at writing a CFX that wraps Lucene (an idea that Joe commented on).

Update: David and Ray sent a couple links:

Building Search Applications for the Web Using Microsoft SQL Server 2000 Full-Text Search [source]

Determining Current Version of CFMX [source, source]

The Goal: A Process of Ongoing Improvement

A Process of Ongoing ImprovementI finished reading “The Goal: A Process of Ongoing Improvement” today (a sunny and windy beautiful day in Boston btw) which Joel pointed to a couple months back. Like Joel, I’ll find it useful should I ever need to run a factory, it also struck a chord with me on some non-programming level, more business like level. One of the takeaways from the book was the description of what the goal of a for-profit organization should be:

… so I can say that the goal is to increase net profit, while simultaneously increasing both ROI and cash flow and that’s the equivalent of saying the goal is to make money…. They’re measurements which express the goal of making money perfectly well, but which also permit you to develop operational rules for running your plant. Their names are throughput, inventory and operational expense… Throughput is the rate at which the system generates money through sales…. Inventory is all the money that the system has invested in purchasing things which it intends to sell…. Operational expense is all the money the system spends in order to turn inventory into throughput.” (pg 59-60) Selling physical widgets is a much different ballgame than selling services or selling software, so I’m having a hard time imagining how this might apply to a web shop like MINDSEYE, but it’s an interesting mental exercise nonetheless.

Another interesting concept (which I’m guessing is covered in the Critical Chain book) is the idea that in a multi-step synchronous process, the fluctuations in speed of the various steps result not in an averaging out of time the system needs to run but rather in an accumulation of the fluctuations. His example was a hike with 15 Boy Scouts, if you let one kid lead the pack and that kid averages between 1.0 and 2.0 mph, the distance between the first and the 15th scout will never average out but will gradually increase with time because “… dependency limits the opportunities for higher fluctuations.” (pg 100). In this case, the first scout doesn’t have any dependencies, but each of the next 14 do: the person in front of them. Thus, as the hike progresses, the other 14 can never go faster than the person in front of them and so on and so forth. This actually does have some usefulness within our daily jobs as programmers. Each of us does depend on someone in one way or another whether it be for design assets or IA docs or a software function. A breakdown in one step of a project can sometimes be made up through sheer willpower and alot of caffeine, but most likely will result in the project being delayed. The same thinking can be applied to software that we write: slow functions, rather the statistical fluctuations in functions (fast or slow) don’t average out as much as they accumulate, which I guess is kind of obvious, but worth noting.

Other nuggets to chew on this morning: “The maximum deviation of a preceding operation will become the starting point of a subsequent operation.” (pg 133)

On bottlenecks within an operation: “… Make sure the bottleneck works only on good parts by weeding out the ones that are defective. If you scrap a part before it reaches the bottleneck, all you have lost is a scrapped part. But if you scrap the part after it’s passed the bottleneck, you have lost time that cannot be recovered.” (pg 156) Do you have expensive (in terms of processor cycles) operations in a computer program after which you then scrub the data for quality? Why not put the scrub before the expensive operation saving yourself some cycles.

On problem solving: “… people would come to me from time to time with problems in mathematics they couldn’t solve. They wanted me to check their numbers for them. But after a while I learned not to waste my time checking the numbers — because numbers were almost always right. However, if I checked the assumptions, they were almost always wrong.” (pg 157) Next time someone asks you to review some code that isn’t producing the correct results, attack their assumptions before going over the code with a fine tooth comb.

BTW, the book is written as a story so if like reading stories and want to stretch the business side of your brain, but don’t enjoy business textbooks, this book is for you!

Update: Jeff Sutherland has a great review of the book from a technical perspective here.

new Macromedia.com

So the new Macromedia.com site went live yesterday, lots of reaction on the various lists I’m on to it. Notably, very few were positive, most were bashing the use of Flash or the speed or [insert pet peeve here]. How about a round of applause for pushing the envelope and drinking your own medicine? They’ve just converted an enormous site (really a combination of two distinct companies just a short time ago) into a massive rich internet application. Congratulations on a job well done! [via JD on MX]