XML characters, smart quotes and Apache XML-RPC

I’ve been eating my own dogfood with the deliciousposter project (as you can see from my daily links). A couple days ago I posted a some links to del.icio.us and expected them to show up automatically the next day… except they didn’t. I traced it down to an errant smart quote that I copied from the Internet Alchemy Talis, Web 2.0 and All That post, which caused the Apache XML-RPC library to throw this error:

java.io.IOException: Invalid character data corresponding to XML entity ’

I worked under the assumption that the smart quote was an invalid XML character for quite awhile, but it looks like it actually is according to the XML 1.1 specification, the following characters are allowed in an XML document:

#x9 | #xA | #xD | [#x20-#xD7FF] | [#xE000-#xFFFD] | [#x10000-#x10FFFF]

I then checked the source code for the XmlWriter which has this method for writing character data:

...
if (c < 0x20 || c > 0xff) {
  // Though the XML-RPC spec allows any ASCII
  // characters except '<' and '&', the XML spec
  // does not allow this range of characters,
  // resulting in a parse error from most XML
  // parsers.
  throw new XmlRpcClientException("Invalid character data " +
  "corresponding to XML entity &#" +
  String.valueOf((int) c) + ';', null);
} else ..

which turns out to be a tad aggressive. It also turns out that the above code snippet and the version of the Apache XML-RPC library I was using are out of date. The chardata(String text) has been updated in the latest version of the Apache XMl-RPC library to include a new method called isValidXMLChar(char c) which is much more lenient:

if (c == '\n') return true;
if (c == '\r') return true;
if (c == '\t') return true;
if (c
and not coincidentally, is compliant with the specification.

I'll be updating deliciousposter to use the latest version of the Apache XML-RPC library soon. In the meantime, if you're using the Apache XML-RPC library, you should probably download the latest version to take advantage of the new XML character validation method.

One thought on “XML characters, smart quotes and Apache XML-RPC”

  1. Hi,

    Good afternoon 🙂

    I have go though the topic “XML characters, smart quotes and Apache XML-RPC”.

    I am a new person to the devolopement related with Xindice XML-DB.

    Could you please direct me to get the deatils of the methods(Server procedures) exported by the Apache XML-RPC?

    Many thanks,
    Prasanth Vijay

Leave a Reply to Prasanth M V Cancel reply

Your email address will not be published. Required fields are marked *