Category Archives: Open Source

PGP Decryption using C#

Sorry if the PGP meme is getting old, I had to write some new code yesterday that decrypted a file full of ciphertext and I didn’t see any other examples on the net, so it gets posted here for posterity. Just to frame the issue, the ciphertext is actually stored in the database, so I first extract the ciphertext from a text column, write the ciphertext to a file, decrypt the file, read the results and then delete the file:

// get ciphertext from DB
// and write to a file
// ...
string passphrase = "my_passphrase";
string filename = "myapp\encrypted.asc";
ProcessStartInfo psi = new ProcessStartInfo("pgp");
psi.UseShellExecute = false;
psi.RedirectStandardInput = true;
psi.RedirectStandardOutput = true;
psi.RedirectStandardError = true;
psi.Arguments = filename + " -m -z " + passphrase;
Process process = Process.Start(psi);
string line = null;
string message = null;
while((line = process.StandardOutput.ReadLine()) != null) {
message = line;
}
Console.WriteLine("message = " + message);

If you’re scoring at home, I used the ProcessStartInfo and Process classes from the .NET Framework to invoke pgp.exe from the command line passing the -m flag so that the decrypted message is printed out to the screen (instead of pgp.exe decrypting the message to a new file) and passing the -z flag so that I can send the passphrase as an argument as well. In my project the message is only one line so I iterate over the lines of output until I get to the last line… where the message is then saved in the message string instance.

Peeling away the code, you end up with this:

C:\pgp6.5.8>pgp c:\myapp\encrypted.asc -m -z my_passphrase
Pretty Good Privacy(tm) Version 6.5.8
(c) 1999 Network Associates Inc.
Uses the RSAREF(tm) Toolkit, which is copyright RSA Data Security, Inc.
Export of this software may be restricted by the U.S. government.
moreflagFile is encrypted. Secret key is required to read it.
Key for user ID: Aaron Johnson
1024-bit DSS key, Key ID ******, created 2004/09/02
Key can sign.
Just a moment...
this is the message

A caveat: if you run the above code from an ASP.NET web application make sure that ASPNET user has access to the private key.

By the way, the folks over at Bouncy Castle have a C# port of the their excellent Java encryption libraries, but it doesn’t appear that the org.bouncycastle.openpgp package has been ported just yet, otherwise I would have used that.

why create jSearch?

One of the comments posted to the blog entry introducing jSearch asked why I thought it needed to be created when a tool like nutch already exists. nutch is a massive undertaking, it’s aim is to create a spider and search engine capable of spidering, indexing and searching billions of web pages while also providing a close shave and making breakfast. nutch wants to be an open source version of google. I created jSearch to be a smaller version of google, indexing single websites or even subsections of a website; more like a departmental or corporate spider, indexing and searching system. If you download the application, you’ll see that jSearch provides some of the same functionality that google does: cached copies of web pages, an XML API (using REST intead of SOAP), logging and reporting of searches and content summarization. Sure, you could use the google web api to provide the same search on your own site, but then you’re limited to the number of searches that google allows per day (1000) with the API, you’re making calls over your WAN to retrieve search results and you have less control (ie: you couldn’t have google index your intranet unless you purchased their appliance).

The second reason I created jSearch was that it was and is an interesting problem to work on. I now have a unique appreciation for the problems that google (or any other company that has created a spider and search engine) has faced. Writing a spider is not a trivial task. Creating a 2 or 3 sentence summary of an HTML page (technically called ‘Text Summarization’) is a topic for master’s thesis. And putting a project like this together becomes a study of the various frameworks for search (Lucene), persistence (Hibernate), and web application development (Struts), which is software engineering.

And really, why not? I enjoyed it. It was interesting and I learned something along the way and I plan on using it.

UBL 1.0

Last week Tim Bray mentioned the May 1st release of UBL 1.0, which he defines as “… a set of general-purpose XML-encoded business documents: orders, acknowledgments, packing slips, invoices, receipts.” He goes on to compare UBL to HTML, saying that because it (UBL) is a generic format rather than a format made for a particular industry (just like HTML was a generic, simpler subset of SGML), it has a chance to become the HTML of the business document world (read: explosive growth, eventual ubitquity). Tim quotes an email from Jon Bosak on some of the other reasons for the creation of UBL:

· Developing and maintaining multiple versions of common business documents like purchase orders and invoices is a major duplication of effort.
· Creating and maintaining multiple adapters to enable trading relationships across domain boundaries is an even greater effort.
· The existence of multiple XML formats makes it much harder to integrate XML business messages with back-office systems.
· The need to support an arbitrary number of XML formats makes tools more expensive and trained workers harder to find.

My current project, which should be released soon, utilizes software from many different companies: tax software, credit card software, shipping rate software, custom software written by the company that manages the distribution of product, etc.. Obviously having a single format to work with would decrease the time I spend a) digging through each companies documentation trying to understand their format and b) wiring up the custom documents for each format, so I’m definitely looking forward to the day when I can use UBL.

For anyone interested, it looks like there is a smattering of support for UBL out there in the Java world: http://softml.net/jedi/ubl/sw/java/, https://jwsdp.dev.java.net/ubl/, http://www.sys-con.com/story/?storyid=37553&DE=1. For further information regarding UBL, see the OASIS UBL TC web page at:
http://www.oasis-open.org/committees/tc_home.php?wg_abbrev=ubl

Hibernate: Non object retrieval

Hibernate has significantly reduced the amount of time I’ve spent on the writing and maintaining SQL in the applications I’m working on. Because it exists to map data from Java classes to database tables and back, there aren’t alot of examples on the site if you need to get non object data out of the database (for instance if you’re doing reporting on the existing data). That’s not to say that it’s not possible! Given a Query object, call the list() method and then iterate over the resulting List. Calling the get() method on the list results in an array of Objects (which is analogous to a row returned from a resultset). Then you’ll just need to retrieve the appropriate element of the array given your SQL query (where the order of the items in your ‘SELECT ..’ SQL query determines the order in which the objects are returned in the Object[]).

// .. code to create a Query object
List list = q.list();
for (int i=0; i
If you're having trouble finding out the Java type of the element in a row, I've found Hibern8IDE to be an excellent help in running, testing and debugging Hibernate queries.

Creating RSS using Java

I wanted to create RSS feeds for karensrecipes.com using Java. I did my ‘research‘, came to this page: Ben Hammersley.com: Java RSS libraries and then used the RSS4j library to create a servlet that serves up dynamic RSS feeds of the 10 most recently created recipes per category (samples: Breakfast, Soup, Barbeque..).

They syntax is pretty simple, you get an RssDocument and set which version you want to use (RSS 1.0, .9 or .91):

RssDocument doc = new RssDocument();
doc.setVersion(RssDocument.VERSION_10);

and then create a RssChannel object and add that to the RssDocument:

RssChannel channel = new RssChannel();
channel.setChannelTitle("Karens Recipes | Most Recent");
channel.setChannelLink("http://www.karensrecipes.com/3/Soup/default.jsp");
channel.setChannelDescription("The 10 most recently added recipes in the soup category.");
channel.setChannelUri("http://www.karensrecipes.com/rss/?categoryid=3");
doc.addChannel(channel);

Next, you’ll retrieve the items using a database, the file system, etc… and add each item as a RssChannelItem:

// connect to the datasource
// iterate over something (db? vector?...)
RssChannelItem item = new RssChannelItem();
item.setItemTitle(label);
item.setItemLink(link);
item.setItemDescription(description);
channel.addItem(item);

and then finally, using the RssGenerator class, call the generateRss() method, in this case I’m sending the output to a Servlet PrintWriter:

PrintWriter out = response.getWriter();
RssGenerator.generateRss(doc, out);

You could just as easily write it to a file:

File file = new File("/opt/data/rss.xml");
try{
RssGenerator.generateRss(doc, file);
System.out.println("RSS file written.");
}
catch(RssGenerationException e){
e.printStackTrace();
}

Simple. Easy to use.

HTTP Spider & Lucene

Spent the majority of my day today refactoring the HTTP spider & Lucene indexing application I’ve been writing on and off for the last couple months as a learning exercise. One of the first things I did was modify the 3 modules to implement the Runnable interface rather than extending the Thread object. Big thanks for Joe for his detailed thoughts on the subject. Probably the biggest reason for doing so is that implementing the Runnable interface means that the classes (a class that handles retrieving web pages, a class that indexes the web pages using Lucene and a class that saves the resulting web pages to a database) could possibly extend some type of task/thread class that I’d want to implement in the future (again, a Joe suggestion).

After completing that, I explored the various ways in which one might interface with the software… the only way (right now) being via the command line with multiple arguments. Since remembering command line arguments can be tedious, I looked at the Properties class, whose methods give you the ability to load a text file with key/element pairs and then get() and set() properties within the file. Java.sun.com has an introduction to the System and Properties class.

Finally, I rewrote each module (mentioned above) so that while still running inside of a while(boolean) loop, they sleep for .5 seconds before iterating through the loop. Hopefully (and it appears this is true) this means that the CPU isn’t stressed out too much.

I uploaded the source here (it also requires the commons http client jar, the commons logging jar, and the lucene jar. If you’re a Java programmer, I’d love your feedback on the code, not from a feature standpoint but from a syntax and architectural standpoint (ie: I care less about whether or not you think you’d actually use this and more about what you think of the code.) How would you change it? What did I do wrong? What did I do right?

lindex

The new DevNet Resource Kit on Macromedia.com contains a utility called lindex (supposedly short for Lucene Index), allowing developers to create full text searching applications using ColdFusion on any platform that supports Java… very cool.

I’m relatively certain it’s been discussed to death before, but why charge for this kind of stuff? The lindex component above wouldn’t be that hard to do and was probably hacked by Christian Cantrell in his spare time for fun… give it away. Macromedia isn’t in the business of selling tutorials and snippets. They sell boxed product last time I checked. Anyway…