Category Archives: Systems Administration

Java ZipEntry bug on Windows

I rolled out the Clearfox plugin on the Jive Software Community site a couple weeks ago and got some good feedback and some bad feedback. A number of people said they tried to install the Firefox part of the plugin, restarted Firefox and then didn’t see the Clearspace icon like my screenshots / screencast showed. There were no errors in the Clearspace error logs and no errors showed up in the Firefox JavaScript debug console. Through the help of a couple customers, I was able to narrow it down to running Clearspace on Windows: for some reason the zip file (really the XPI file) that the Clearfox plugin creates on the fly was invalid, at least according to Firefox. If you opened the XPI file using any common zip file utility the contents appeared to fine. As always, google came to the rescue and pointed me to this bug filed on bugs.sun.com, which has two parts. The Unicode file name bug didn’t matter to me, but this one did:

Within a ZIP file, pathnames use the forward slash / as separator, as required by the ZIP spec. This requires a conversion from or to the local file.separator on systems like Windows. The API (ZipEntry) does not take care of the transformation, and the need for the programmer to deal with it is not documented.

which wouldn’t hurt so much if it hadn’t been filed back in… get this… 1999. Are you kidding me?

Anyway, long story short: if you’re writing Java, creating a zip file that has paths while on a Windows based machine and deploying said zip file to a place that actually cares about the zip file specification (or violates Postel’s Law), then make sure to do something like this in your Java code:

String zipFilePath = file.getPath();
if (File.separatorChar != '/') {
  zipFilePath = zipFilePath.replace('\\', '/');
}
ZipEntry zipAdd = new ZipEntry(zipFilePath);

noting that even the workaround they give in the aforementioned bug is incorrect because they show

... file.getName();

which doesn’t contain the path separators. Awesome.

Java, Commons HTTP Client and HTTP proxies

If you’re living at a giant corporation during the day and you want to browse the web you’re probably going through some sort of proxy to the outside world. Most of the time you don’t care, but if you’re writing a Java application that needs to access resources on the other side of said proxy (ie: the rest of the world), you’ll eventually end up over here. That wonderful document will hook you up with all your need to know about setting the proxy host, port and optionally a username and password for your proxy as long as you’re using URLConnection, HttpURLConnection or anything that deals with the class URL. If you’re really a go-getter you might even browse over here and read all about how to utilize those properties on the command line, in code or when you’re deployed inside of Tomcat.

Some of you won’t be so lucky: you’ll eventually want to use some advanced tools that abstract you away from having to fetch InputStreams to get your feeds and will instead depend on the Commons HTTP Client, which unfortunately (or fortunately depending on your point of view), doesn’t care about those nice little system properties that java.net.URL likes and instead goes off and uses sockets directly. No, instead you have to do something like this:

HttpClient client = new HttpClient();
HttpConnectionManager conManager = client.getHttpConnectionManager();
client.getHostConfiguration().setProxy("proxyserver.example.com", 8080);

and if you want to provide a username and password for said proxy:

HttpState state = new HttpState();
state.setProxyCredentials(null, null,
   new UsernamePasswordCredentials("username", "password"));
client.setState(state);

which is all fine and dandy but sometimes I just wish the world were simpler. Ya know?

Using a custom socket factory with HttpClient

The documentation on how to use a custom socket factory in HttpClient is actually pretty good, but there’s one thing they don’t make very clear. If you want to specify a per-host socket factory like this:

Protocol p = new Protocol("https", new MySSLSocketFactory(), 443);
HttpClient httpclient = new HttpClient();
httpclient.getHostConfiguration().setHost("example.com", 443, p);
GetMethod httpget = new GetMethod("/mypath");

you must specify a path and not the full URL in the GetMethod constructor. If you don’t, in other words, if you use a constructor like this:

GetMethod httpget = new GetMethod("https://example.com/mypath");

the custom socket factory you specified in the host configuration will not be used.

Finding the number of files of a certain file type that a process has open

Used this week when trying to debug a problem with an application that was resulting in errors that said “too many files open”:

/usr/sbin/lsof -p $processID | grep .$ext | wc -l

where $processID is the ID of the process that is opening the files and $ext is the (most likely) three letter file extension of the file that is being open.

Also, if you run:

ulimit

and you get ‘unlimited’ as the result, you’d be wrong to assume that you have an unlimited number of file handles available to you:

ulimit -n

will give you the total number of file handles you’re allowed.

And that’s one to grow on.

More here.

The Referer header, intranets and privacy

I’ve discussed meaningful URL’s a number of times on this site: one of the biggest benefits of a good blog URL is that you can infer who posted the article, when it was posted and what the blog post is about. For the most part this is all ‘a good thing’. But when you’re blogging on an intranet and you create a blog post that results in a URL like this:

http://intranet.example.com/blogs/aaron/2007/02/07/our-secret-widget-is-going-to-kill-our-competition

and then in the blog post you put a couple links to your competition and embed a picture of their latest product, you’re potentially letting secrets through the firewall without evening knowing it. See, HTTP has this really nice mechanism for specifying both a) what page an image is loading in and b) what page the user was on when they clicked on a link to visit the next page. It’s called the HTTP referer and it’s commonly used for good: web statistics packages (like Google Analytics or AWStats) use the referer header to show you click paths through your site and to show you what other websites are linking to you. A typical request in an Apache HTTPD log file might look something like this:

86.105.195.89 - - [06/Feb/2007:01:54:32 -0500] "GET /blogs/aaron/ HTTP/1.1" 200 34659 "http://intranet.example.com/blogs" "Mozilla/5.0 (Windows; U; Windows NT 5.1; en-US; rv:1.8.1.1; .NET CLR 2.0.50727) Gecko/20061204 Firefox/2.0.0.1"

but back to the point at hand: if you’re using blogs or wikis or anything that might produce a clean, understandable, meaningful URL and you or your company are serious about security, you’ll want to make sure that HTTP Referers are blocked because you really don’t want the president of your company breathing down your neck on a Monday morning because your competition just called… and they know. Here’s how:

  • Force anyone / everyone reading your internal site to use a Firefox plugin called RefControl, which allows you to control what gets sent in the referer field per website. Unless you’re the IT guy and you can force people to use this plugin, it’s doubtful this would work.
  • Force all of your outgoing links through what’s called a dereferer. Again, this is unwieldy, can probably be subverted and may not work for images. (you can do the same thing by modifying your Firefox config, but the plugin is easier)
  • Use HTTPS for all the pages on your intranet because RFC 2616 states that:

    Clients SHOULD NOT include a Referer header field in a (non-secure) HTTP request if the referring page was transferred with a secure protocol.

    which means that even if someone does create a link to your competition’s website on the intranet, your competition won’t find out.

On a semi-related note, here are a couple things I learned from reading this article by Eric Lawrence (creator of the fine HTTP Fiddler Tool for Windows):

  • Fiddler has a really cool diff feature where you can select two sessions, right click and select WinDiff from the menu
  • somehow he’s got Firefox hooked up to Fiddler… I gotta learn how.
  • example.com is reserved by RFC2606 specifically for the purpose of blog posts like this. Try the link. Who knew?

Firefox mimeTypes.rdf corruption

Came across another interesting bug today involving Firefox and mime types. Firefox uses a file called mimeTypes.rdf (stored in your profile folder) to keep track of a) what application should be opening the file you’re downloading and b) what kind of file it should tell a server it’s sending when you upload a file. And it works … for the most part. See, if you download a PDF file from a server that (incorrectly) states that the content-type of the file is ‘application/unknown’, choose to open it using Adobe Acrobat and then check the box that says ‘Do this automatically from now on’, Firefox will store that bit of knowledge away in mimeTypes.rdf. Now go and use a web application that you upload files to and which analyzes the content-type of the files you’re uploading and upload a PDF file. If you’re using LiveHTTPHeaders, you’ll notice that you’re not sending ‘application/pdf’ but instead ‘application/x-download’.

It looks like this bug was filed in bugzilla a couple times and even acknowledged in their documentation, but has yet to be fixed. You can ‘fix’ the problem by deleting your mimeTypes.rdf file and restarting Firefox.

RSS/Atom feeds, Last Modified and Etags

Sometime last week I read this piece by Sam Ruby, which summarized says this:

…don’t send Etag and Last-Modified headers unless you really mean it. But if you can support it, please do. It will save you some bandwidth and your readers some processing.

The product I’ve been working on at work (which I should be able to start talking about soon which I can talk about now) for the last couple months uses feeds (either Atom, RSS 1.0 or RSS 2.0, your choice) extensively but didn’t have Etag or Last-Modified support so I spent a couple hours working on it this past weekend. We’re using ROME, so the code ended up looking something like this:

HttpServletRequest request = ...
HttpServletResponse response = ....
SyndFeed feed = ...
if (!isModified(request, feed)) {
  response.setStatus(HttpServletResponse.SC_NOT_MODIFIED);
} else {
  long publishDate = feed.getPublishedDate().getTime();
  response.setDateHeader("Last-Modified", publishDate);
  response.setHeader("Etag", getEtag(feed));
}
...
private String getEtag(SyndFeed feed) {
  return "\"" + String.valueOf(feed.getPublishedDate().getTime()) + "\"";
}
...
private boolean isModified(HttpServletRequest request, SyndFeed feed) {
  if (request.getHeader("If-Modified-Since") != null && request.getHeader("If-None-Match") != null) {
  String feedTag = getEtag(feed);
    String eTag = request.getHeader("If-None-Match");
    Calendar ifModifiedSince = Calendar.getInstance();
    ifModifiedSince.setTimeInMillis(request.getDateHeader("If-Modified-Since"));
    Calendar publishDate = Calendar.getInstance();
    publishDate.setTime(feed.getPublishedDate());
    publishDate.set(Calendar.MILLISECOND, 0);
    int diff = ifModifiedSince.compareTo(publishDate);
    return diff != 0 || !eTag.equalsIgnoreCase(feedTag);
  } else {
    return true;
  }
}

There are only a two gotchas in the code:

  1. The value of the Etag must be quoted, hence the getEtag(...) method above returning a string wrapped in quotes. Not hard to do, but easy to miss.
  2. The first block of code above uses the setDateHeader(String name, long date) to set the ‘Last-Modified’ HTTP header, which conveniently takes care of formatting the given date according to the RFC 822 specification for dates and times. The published date comes from ROME. Here’s where it gets tricky: if the client returns the ‘If-Modified-Since’ header and you retrieve said date from the request using getDateHeader(String name), you’ll get a Date in the GMT timezone, which means if you want to compare the date you’ll have to get the date into your own timezone. That’s relatively easy to do by creating a Calendar instance and setting the time of the instance to the value you retrieved from the header. The Calendar instance will transparently take care of the timezone change for you. But there’s still one thing left: the date specification for RFC 822 doesn’t specify a millisecond so if the long value you hand to setDateHeader(long date) method contains a millisecond value and you then try to use the same value to compare against the ‘If-Modified-Since’ header, you’ll never get a match. The easy way around that is to manually set the millisecond bits on the date you get back from the ‘If-Modified-Since’ header to zero.

If you’re interested, there are a number of other blogs / articles about Etags and Last-Modified headers:

Bulk import of email addresses into Movable Type notifications

One of the blogs on my Movable Type installation is actually using the notifications feature built into Movable Type (which gives you the ability to let people subscribe to a blog via email) and needed to do a bulk import of email addresses that he already had in csv format. Unfortunately, Six Apart doesn’t provide a way to do a bulk import so I dug into google. Because the install is backed by MySQL, the solution was to use the LOAD DATA INFILE command, which reads a csv file from disk and loads the data into a table you specify. In case someone else ever needs to do this with Movable Type, the syntax I ended up using looked like this:

LOAD DATA LOCAL INFILE 'contacts.csv' 
INTO TABLE mt_notification 
FIELDS TERMINATED BY ',' 
LINES TERMINATED BY '\n' 
(notification_email,notification_blog_id,notification_name)

and my contacts.csv file looked like this:

ajohnson@cephas.net,10,Aaron Johnson
...

Also interesting to note that according to the MySQL documentation, the LOAD DATA INFILE command is usually 20 times faster (!) than using separate INSERT INTO statements.

New design

I got really bored with the old design of this site and all the cool kids seem to be using WordPress these days so last weekend I exported all 900 or so entries from Movable Type and imported them into WordPress, installed the ScribbishWP Theme and wrote a servlet filter to map my /blog/year/month/day/entry_name.html Movable Type permalinks to the WordPress style which is /blog/year/month/day/entry-name/. Oh, and comments are back on.

Enjoy!