{"id":703,"date":"2005-08-12T15:20:27","date_gmt":"2005-08-12T19:20:27","guid":{"rendered":"http:\/\/wordpress.cephas.net\/?p=703"},"modified":"2005-08-12T15:20:27","modified_gmt":"2005-08-12T19:20:27","slug":"xml-characters-smart-quotes-and-apache-xml-rpc","status":"publish","type":"post","link":"https:\/\/cephas.net\/blog\/2005\/08\/12\/xml-characters-smart-quotes-and-apache-xml-rpc\/","title":{"rendered":"XML characters, smart quotes and Apache XML-RPC"},"content":{"rendered":"<p>I&#8217;ve been <a href=\"http:\/\/www.joelonsoftware.com\/articles\/fog0000000012.html\">eating my own dogfood<\/a> with the <a href=\"http:\/\/cephas.net\/projects\/deliciousposter\/\">deliciousposter project<\/a> (as you can see from my <a href=\"http:\/\/cephas.net\/blog\/daily_links\/\">daily links<\/a>). A couple days ago I posted a some links to del.icio.us and expected them to show up automatically the next day&#8230; except they didn&#8217;t.  I traced it down to an errant smart quote that I copied from the <a href=\"http:\/\/internetalchemy.org\/2005\/07\/talis-web-20-and-all-that\">Internet Alchemy Talis, Web 2.0 and All That<\/a> post, which caused the Apache XML-RPC library to throw this error:<br \/>\n<code><br \/>\njava.io.IOException: Invalid character data corresponding to XML entity &#8217;<br \/>\n<\/code><br \/>\nI worked under the assumption that the <a href=\"http:\/\/www.intertwingly.net\/blog\/?q=1685\">smart quote was an invalid XML character<\/a> for quite awhile, but it looks like it actually is according to the <a href=\"http:\/\/www.w3.org\/TR\/2004\/REC-xml11-20040204\/\">XML 1.1 specification<\/a>, the following <a href=\"http:\/\/www.w3.org\/TR\/2004\/REC-xml11-20040204\/#charsets\">characters<\/a> are allowed in an XML document:<br \/>\n<code><br \/>\n#x9 | #xA | #xD | [#x20-#xD7FF] | [#xE000-#xFFFD] | [#x10000-#x10FFFF]<br \/>\n<\/code><br \/>\nI then checked the <a href=\"http:\/\/ws.apache.org\/xmlrpc\/clover\/org\/apache\/xmlrpc\/XmlWriter.html\">source code for the XmlWriter<\/a> which has this method for writing character data:<br \/>\n<code><br \/>\n...<br \/>\nif (c &lt; 0x20 || c &gt; 0xff) {<br \/>\n&nbsp;&nbsp;\/\/ Though the XML-RPC spec allows any ASCII<br \/>\n&nbsp;&nbsp;\/\/ characters except '&lt;' and '&amp;', the XML spec<br \/>\n&nbsp;&nbsp;\/\/ does not allow this range of characters,<br \/>\n&nbsp;&nbsp;\/\/ resulting in a parse error from most XML<br \/>\n&nbsp;&nbsp;\/\/ parsers.<br \/>\n&nbsp;&nbsp;throw new XmlRpcClientException(\"Invalid character data \" +<br \/>\n&nbsp;&nbsp;\"corresponding to XML entity &amp;#\" +<br \/>\n&nbsp;&nbsp;String.valueOf((int) c) + ';', null);<br \/>\n} else ..<br \/>\n<\/code><br \/>\nwhich turns out to be a tad aggressive.  It also turns out that the above code snippet and the version of the Apache XML-RPC library I was using are out of date.  The <code>chardata(String text)<\/code> has been updated in the latest version of the Apache XMl-RPC library to include a new method called <code>isValidXMLChar(char c)<\/code> which is much more lenient:<br \/>\n<code><br \/>\nif (c == '\\n') return true;<br \/>\nif (c == '\\r') return true;<br \/>\nif (c == '\\t') return true;<br \/>\nif (c<br \/>\nand not coincidentally, is compliant with the specification.<\/p>\n<p>I'll be updating deliciousposter to use the latest version of the Apache XML-RPC library soon.  In the meantime, if you're using the Apache XML-RPC library, you should probably download the latest version to take advantage of the new XML character validation method.<\/p>\n","protected":false},"excerpt":{"rendered":"<p>I&#8217;ve been eating my own dogfood with the deliciousposter project (as you can see from my daily links). A couple days ago I posted a some links to del.icio.us and expected them to show up automatically the next day&#8230; except they didn&#8217;t. I traced it down to an errant smart quote that I copied from &hellip; <a href=\"https:\/\/cephas.net\/blog\/2005\/08\/12\/xml-characters-smart-quotes-and-apache-xml-rpc\/\" class=\"more-link\">Continue reading <span class=\"screen-reader-text\">XML characters, smart quotes and Apache XML-RPC<\/span> <span class=\"meta-nav\">&rarr;<\/span><\/a><\/p>\n","protected":false},"author":1,"featured_media":0,"comment_status":"open","ping_status":"closed","sticky":false,"template":"","format":"standard","meta":[],"categories":[3,4,2,10],"tags":[],"_links":{"self":[{"href":"https:\/\/cephas.net\/blog\/wp-json\/wp\/v2\/posts\/703"}],"collection":[{"href":"https:\/\/cephas.net\/blog\/wp-json\/wp\/v2\/posts"}],"about":[{"href":"https:\/\/cephas.net\/blog\/wp-json\/wp\/v2\/types\/post"}],"author":[{"embeddable":true,"href":"https:\/\/cephas.net\/blog\/wp-json\/wp\/v2\/users\/1"}],"replies":[{"embeddable":true,"href":"https:\/\/cephas.net\/blog\/wp-json\/wp\/v2\/comments?post=703"}],"version-history":[{"count":0,"href":"https:\/\/cephas.net\/blog\/wp-json\/wp\/v2\/posts\/703\/revisions"}],"wp:attachment":[{"href":"https:\/\/cephas.net\/blog\/wp-json\/wp\/v2\/media?parent=703"}],"wp:term":[{"taxonomy":"category","embeddable":true,"href":"https:\/\/cephas.net\/blog\/wp-json\/wp\/v2\/categories?post=703"},{"taxonomy":"post_tag","embeddable":true,"href":"https:\/\/cephas.net\/blog\/wp-json\/wp\/v2\/tags?post=703"}],"curies":[{"name":"wp","href":"https:\/\/api.w.org\/{rel}","templated":true}]}}