Category Archives: Software Development

Automating Application Deployment with Tomcat

I’m asking a bunch of questions here so if you’re expecting answers, look elsewhere. In the past couple years, I’ve waded through a bunch of different ways to configure and build Java web applications. At first, I hardcoded connection strings, settings and message strings in the source code, compiled using Eclipse and copied over the JSP and class files to the web server (lame, I know, but you gotta start somewhere). As the applications I wrote got more complex and as I got smarter, I started using Ant to perform the build and I learned about properties files and web.xml. After unpacking alot of other open source Java applications, I’m now using log4j for logging and error messaging, JNDI for configuration (datasource and application configuration stored in Tomcat’s server.xml), resource bundles for storing internationalized message strings, JUnit for running tests and Ant for cleaning, testing and building my apps into a deployable format (war files).

And that’s where it stops being easy. Deployments suck. We have a pretty small environment (a couple test servers, a couple staging servers, a couple live servers) and deploying changes to those servers is tedious. For example, in my development environment, I like to have log4J append any and all information to the console, which lets me watch the system as it starts up and runs:

log4j.category.com.mycompany=INFO, STDOUT

But once I build the application and deploy it to the live environment, I only want error messages and I want them sent to the SMTPAPPENDER:

log4j.category.com.mycompany=ERROR, SMTPAPPENDER

so I’m stuck editing a text file every time we deploy an application. It’s not that big a of a deal, but I also have applications on each server that need to have the appropriate entries in Tomcat’s server.xml (environment entries, JDBC connections, etc..), sometimes Tomcat needs to be restarted after your deploy the war, applications are deployed to different directories on different machines, sometimes the application being rolled out doesn’t work and you need to roll back to the previous version, how do you keep track of all the live / staging / development servers? The environment I work in is pretty small, so all of this can be done by hand, but it’s tedious, boring and error prone. How do you guys that work in larger environments do it? How do you move the .war files from your staging environment to the live environment? Using Ant? Do you trigger Ant tasks on the live servers that check out source code from CVS and build the apps there? Do you restart Tomcat every time? Do you do one half of your machines at a time and then the next half? You can’t be doing this by hand! Any tips?

daily links

· Bloglines API

· Paul Graham on what the Bubble got right: “… what would be wrong would be that how one presented oneself counted more than the quality of one’s ideas. That’s the problem with formality. Dressing up is not so much bad in itself. The problem is the receptor it binds to: dressing up is inevitably a substitute for good ideas. It is no coincidence that technically inept business types are known as ‘suits.'”

· LionShare: “… an innovative effort to facilitate legitimate file-sharing among individuals and educational institutions around the world.”

· .NET Reflector: allows you to view the source code to most .NET framework classes and code written in .NET. So how does one write / compile code in such a way as to not allow someone to disassemble their code?

The ‘using’ keyword

Joe pointed out that the C# PGP decryption code that I wrote could be better; specifically I should be checking the xExitCode property of the Process instance and the code would also be better served if I made certain that I disposed of the Process instance by calling the Close() method when I’m done using it. The ExitCode improvement is relatively simple to add; start the process, read any lines of output and then check the ExitCode to see if everything went smoothly:

Process process = Process.Start(psi);
...
while(!process.HasExited) {
... // do stuff
}
if (process.ExitCode != 0) {
// something went wrong..
}

The second thing that Joe mentioned was the ‘using’ statement, which is a novel feature of C# that provides “… an easy way for C# code to acquire and release unmanaged or otherwise precious resources.” The code I originally wrote didn’t destroy the handle to the process; after all was said and done I should have called the Close() or Dispose() method of the process:

Process process;
try {
process = Process.Start(psi);
...
} catch {
process.Close();
}

The ‘using’ statement is syntactic sugar that’s a shortcut for the well worn try / catch idiom and shortens the above example to:

using (Process process = Process.Start(psi)) {
... // do stuff w/ the process
}

which then automatically calls the Dispose() method of the process.

Joe goes on to hijack the ‘using’ statement in some other novel ways which should you check out when you have a chance.

Conflicting mindsets of C# vs. Java: Part II

You all read the the ‘Conflicting mindsets of C# vs. Java‘ weblog post right? And you all noticed that the guys running the Lucene.NET project on sourceforge closed up shop, took all their toys and went on home right? I’m gonna go out on a limb and say that they’re related.

The way I see it, *in general* the .NET community conversation is dominated by talk about the latest and greatest that microsoft is putting out; there’s talk about MapPoint Location Server, SQL Server, Longhorn, ASP.NET 2.0 and Visual Studio; all products of Redmond. The same group of Java developers are talking about JBoss, Hibernate, Struts and Eclipse: none of which came out of the Silicon Valley.

Malcolm’s mindset #1 says that .NET developers accept the tools and services that are provided them by Microsoft and I think for the most part this is true. You don’t see .NET developers spending their cycles on persistence layers, web application frameworks or caching solutions probably because Microsoft has provided Microsoft solutions for all these problems. But if it’s just providing tools, then why aren’t JSF, JDO and NetBeans dominating the javablogs conversations? Seriously, take a look at ASP.NET and JSF. They aren’t that different and yet ASP.NET is widely used in conjunction with Visual Studio while JSF is rarely lauded and more often derided. I think he’s right, it’s really a mindset.

Which brings me back to the Lucene.NET guys. Why would they close up shop? Why not continue to donate their time and energy to an excellent cause? Maybe the Microsoft mindset has something to do with it. How about this: a search on google for ‘lucene’ within the weblogs.asp.net domain yields exactly 17 results. The same search on jroller.com yields 2570 results. Admittedly, Lucene has been around longer, but maybe one of the reasons that the Lucene.NET guys packed it up (and are now trying to sell their work) is that no one paid any attention to them because they were all too busy working with SQL Server full-text indexing, a tool given them by Microsoft (but one that costs thousands of dollars per processor). Another reason that a project like Lucene or Struts or Tomcat flourishes is because there is a certain amount of prestige working on a big open source project. If you work on open source projects for the prestige and you’re not getting the attention you think you deserve, you find another motivation. In their case money was a motivation, so they closed up project on sourceforge and they’re selling a personal edition and a business edition. They might make a couple bucks, but I bet in 1 year there won’t be many people writing about searchblackbox.com.

So what’s my point? That all .NET developers are greedy and don’t care about the community? Not really. I think it’s that the two communities have different bus drivers: .NET developers look to Microsoft to provide the tools they need to do their jobs… and if they look elsewhere or copy something else, Microsoft will eventually come in and make a product of their own that does the job, thereby negating any work the developers do in the meantime. Microsoft drives the bus. Java developers look at the products and specs that Sun puts out and then go and build their own tools or frameworks or applications to do the job. Sun will eventually put out something through the JCP that does the job…. but the developers in the Java community will only use it if they want too, witness the continued popularity of Struts and the lack of interest in JSF. In the Java camp, the developers drive the bus.

daily links

· The guys at Search Engine Watch have a blog now…

· Python date handling tutorial

· Java Open Single Sign-On Project: an open source J2EE-based SSO infrastructure aimed to provide a solution for centralized platform neutral user authentication.

· Better Software magazine: delivers relevant, timely information to tackle the challenges of building better quality software, regardless of your role in the software development lifecycle.

· Better Software magazine on Continuous Integration

· CruiseControl: I need to start using this…

· No balls. No babies.

PGP Decryption using C#

Sorry if the PGP meme is getting old, I had to write some new code yesterday that decrypted a file full of ciphertext and I didn’t see any other examples on the net, so it gets posted here for posterity. Just to frame the issue, the ciphertext is actually stored in the database, so I first extract the ciphertext from a text column, write the ciphertext to a file, decrypt the file, read the results and then delete the file:

// get ciphertext from DB
// and write to a file
// ...
string passphrase = "my_passphrase";
string filename = "myapp\encrypted.asc";
ProcessStartInfo psi = new ProcessStartInfo("pgp");
psi.UseShellExecute = false;
psi.RedirectStandardInput = true;
psi.RedirectStandardOutput = true;
psi.RedirectStandardError = true;
psi.Arguments = filename + " -m -z " + passphrase;
Process process = Process.Start(psi);
string line = null;
string message = null;
while((line = process.StandardOutput.ReadLine()) != null) {
message = line;
}
Console.WriteLine("message = " + message);

If you’re scoring at home, I used the ProcessStartInfo and Process classes from the .NET Framework to invoke pgp.exe from the command line passing the -m flag so that the decrypted message is printed out to the screen (instead of pgp.exe decrypting the message to a new file) and passing the -z flag so that I can send the passphrase as an argument as well. In my project the message is only one line so I iterate over the lines of output until I get to the last line… where the message is then saved in the message string instance.

Peeling away the code, you end up with this:

C:\pgp6.5.8>pgp c:\myapp\encrypted.asc -m -z my_passphrase
Pretty Good Privacy(tm) Version 6.5.8
(c) 1999 Network Associates Inc.
Uses the RSAREF(tm) Toolkit, which is copyright RSA Data Security, Inc.
Export of this software may be restricted by the U.S. government.
moreflagFile is encrypted. Secret key is required to read it.
Key for user ID: Aaron Johnson
1024-bit DSS key, Key ID ******, created 2004/09/02
Key can sign.
Just a moment...
this is the message

A caveat: if you run the above code from an ASP.NET web application make sure that ASPNET user has access to the private key.

By the way, the folks over at Bouncy Castle have a C# port of the their excellent Java encryption libraries, but it doesn’t appear that the org.bouncycastle.openpgp package has been ported just yet, otherwise I would have used that.

Extracting Text From MS Word

Someone on the Lucene User list wanted to know if it was possible to search MS Word documents using Lucene. The normal response is to go and take a look at the Jakarta POI project (new blog by the way). Ryan Ackley submitted his website (textmining.org) along with a plug for his TextMining.org Word Text Extractor v0.4 and some sample code:

FileInputStream in = new FileInputStream ("test.doc");
WordExtractor extractor = new WordExtractor();
String str = extractor.extractText();

Nice.

Someone else noted that the Python version of Lucene (called Lupy) has an indexer for MS Word and PDF as well, although it appears to only work on Windows.

ebay web services talk by Jeffrey McManus

A couple weeks ago I attended the Boston .NET User Group to hear Jeffrey McManus give a talk on how ebay is using web services. I took a bunch of notes during his presentation, which I’m putting up here semi-edited. The first part of the talk was mainly about ebay the company; how much business they do per quarter, how many users they have,etc.. The second part was about the ebay API, which I was interested in because the company I’m working for now is exploring a similar program, so you’ll see more detailed notes there:

developer.ebay.com

45000 product categories
collectibles was the first product category on ebay, it is category id = 1
ebay did approximately $5.6 billion in cars last year

105 million registered users as of 3/1/2004
rate of growth is accelerating quarter over quarter
28 countries with a physical presence and localized website
20 million items for sale at any one time…
1 billion items were listed last year
7 gigabits per second outgoing traffic
10 billion api hits in 2004

in december 2003, 1 of every 3 people on the internet came to ebay

1 in 3 used golf clubs bought in the US are purchased on ebay
valueguide.pga.com uses ebay data to help determine the price of used golf equipment

Ebay isn’t just for knick knacks: more than 125,000 keyword searches for Louis Vuitton, Coach and Prada on ebay every day

superpawn.com
— they build a proprietary POS with ebay and significantly reduced costs
— used to cost $23 per item to list & describe each item on ebay (person costs), now approx $.25

highline auctions
— custom system integrator with a business around ebay integration, specifically in cars
— reduced the time it takes to create a professional listing from 30 minutes to under 3 minutes

auctionworks
— solution provider on ebay

use cases
— integration w/ business tools (quickbooks, excel, sap, etc..) (ie: a donation can be estimated for taxes using a data licensing application)
— integration w/ back ends for medium & large retailers, manufacturers

‘we make inefficient markets efficient’
scarce –> in-season reatail –> end of life–> used/vintage
— end of life scenario for retailers
— products at the end of life can be put on ebay programmatically saving companies the most of having to get rid of excess product

top level meta categories
— sports, tickets, etc…
— ‘browseability’ is a one of the challenges
— most likely use cases are sellers listing items programatically

data licensing services
— you can purchase data feeds from ebay so that you can find out what the given price for a given car in a given location is
andale.com: provides ebay pricing and selling recommendations

API

why create jSearch?

One of the comments posted to the blog entry introducing jSearch asked why I thought it needed to be created when a tool like nutch already exists. nutch is a massive undertaking, it’s aim is to create a spider and search engine capable of spidering, indexing and searching billions of web pages while also providing a close shave and making breakfast. nutch wants to be an open source version of google. I created jSearch to be a smaller version of google, indexing single websites or even subsections of a website; more like a departmental or corporate spider, indexing and searching system. If you download the application, you’ll see that jSearch provides some of the same functionality that google does: cached copies of web pages, an XML API (using REST intead of SOAP), logging and reporting of searches and content summarization. Sure, you could use the google web api to provide the same search on your own site, but then you’re limited to the number of searches that google allows per day (1000) with the API, you’re making calls over your WAN to retrieve search results and you have less control (ie: you couldn’t have google index your intranet unless you purchased their appliance).

The second reason I created jSearch was that it was and is an interesting problem to work on. I now have a unique appreciation for the problems that google (or any other company that has created a spider and search engine) has faced. Writing a spider is not a trivial task. Creating a 2 or 3 sentence summary of an HTML page (technically called ‘Text Summarization’) is a topic for master’s thesis. And putting a project like this together becomes a study of the various frameworks for search (Lucene), persistence (Hibernate), and web application development (Struts), which is software engineering.

And really, why not? I enjoyed it. It was interesting and I learned something along the way and I plan on using it.