ASP, Java, Cookies and URLEncode

One of the bugs I tracked down a couple months ago dealt with the different ways in which ASP and Java handle URL encoding in a cookies. ASP automatically encodes and decodes cookie values (without you telling it too even though it the Server object has a method ‘URLEncode’ which does the same job) while Java requires that you explicity use the URLDecoder and URLEncoder, otherwise it sends the cookie values without encoding them first, which in my case lead to an interoperability problem (one assumes URL encoding of cookie values, one doesn’t).

But then it gets weirder. The encoding that ASP automatically applies is different than the encoding that Java applies. For example, in ASP you’d write something like this to set a cookie with my email address:

Response.cookies("encoded") = "ajohnson@cephas.net"

which would then result in this:

ajohnson%40cephas%2Enet

where in Java you’d have something like this:

String email = "ajohnson@cephas.net";
String encoded = URLEncoder.encode(email, "UTF-8");
res.addCookie(new Cookie("encoded", encoded););

which results in this:

ajohnson%40cephas.net

So why does ASP encode the period as %2E and Java leave it alone? The Microsoft documentation for the ASP Server object contains very little documentation about the URLEncode method other than to say that the URLEncode method

… applies URL encoding rules, including escape characters, to a specified string.

(which could be a teensy bit more helpful, must have been a programmer writing that documentation). On the other hand, the Java URLEncoder class documenation says that the special characters “.”, “-”, “*”, and “_” are not to be encoded according to the HTML specification, which itself cites RFC 1738 which says:

…the special characters “$-_.+!*’(),”, and reserved characters used for their reserved purposes may be used unencoded within a URL.

which seems to me to imply that the ASP Cookie object is wrong (ie: it should leave the period alone in the email address). And beyond that, it’s pretty aggressive in assuming that I want to encode the data in my cookies. Am I missing something? How do other languages (PHP, Python, Perl, etc..) implement the same functionality? Aren’t web applications fun?

4 thoughts on “ASP, Java, Cookies and URLEncode”

  1. Couple of points… first, I think cookies are actually supposed to be opaque to anyone but the server. That is, no particular encoding scheme at all is required by the cookie specification (RFC 2965), except that the end result fits into 7-bit ASCII and doesn’t use certain illegal characters. URL-encoding is one convenient and logical choice for such encoding, but I think an application server could choose (say) Base-64 encoding and be just as compliant to any spec that might apply.

    Anyway, even if URL-encoding is mandated, that still wouldn’t make ASP wrong… “ajohnson%40cephas%2Enet” is a perfectly valid URL-encoded version of your e-mail address, in that any URL decoder should transform it to the desired value.

    The interesting question, to me, is whether ASP is being “too aggressive” in URL encoding/decoding everything. In what situations would automatic URL encoding/decoding hurt you? The only thing that pops into my head is, if you have an encoding/decoding scheme that is more appropriate/efficient to your data. Or your case is a good example as well, where you might want to tightly control how the encoding works because some other piece of software you don’t have much control over will need to inspect it.

    However, automatic URL encoding/decoding protects you from ever putting illegal characters into a cookie. Depending on how Java handles this–does it throw an exception when you try to set the value, or does it send the illegal characters to the user-agent and hope for the best?–you could end up with real problems. And if your app makes it through testing without encountering illegal characters, you might not know this is a problem until it’s too late.

    Either approach seems reasonable to me–for ASP’s target audience, Microsoft probably made a good choice.

    If you think ASP is too “aggressive”, google PHP’s “magic quotes” (mis-)feature.

  2. PHP does similar URL encoding as Java. It does not encode the period, but does encode the ampersand. However, PHP’s URL decoding method will convert both the Java/PHP encoded string and the ASP encoded string into the identical output.

    PHP 5 contains a new function that does not do any URL encoding, in case you want to explicitly control how this is done.

  3. I think the key word in RFC 1738 is *may*, as in “special characters … may be used unencoded”. You don’t violate the spec to encode more characters than necessary. In fact, you could even encode the alpha characters if you wanted to. Insofar as encoding the ‘.’, the ASP Cookie object is not wrong to do so.

    Best to keep in mind the networking dictum to be fussy about what you put out, but tolerant about what you take in.

Leave a Reply

Your email address will not be published. Required fields are marked *

You may use these HTML tags and attributes: <a href="" title=""> <abbr title=""> <acronym title=""> <b> <blockquote cite=""> <cite> <code> <del datetime=""> <em> <i> <q cite=""> <strike> <strong>