java.lang.IllegalArgumentException: Illegal group reference, replaceAll and dollar signs

This weblog is officially about inane things I run into while trying to do my job at work. Let’s say you have a String object like this:

String mystring = "Your password: #PASSWORD";

and at runtime you need to replace the value of #PASSWORD with a password that a user typed in. You’d write something like this:

String password = "$Jslwe"
mystring = mystring.replaceAll("#PASSWORD", password);

What would happen? You’d expect that the key #PASSWORD would get replaced with the value of the variable ‘password’ (which is “$Jslwe”) and then you’d move happily on your way to something much more interesting. But no, Java throws you an error:

java.lang.IllegalArgumentException: Illegal group reference

which is extremely helpful. Turns out that the second argument to the String replaceAll method “may” have some issues with dollar signs and backslashes which you only find out about if you dig into the Matcher class that backs the replaceAll method or if you’re lucky and you read about the whole thing on a site devoted to regular expressions. In short:

myString.replaceAll(“regex”, “replacement”) replaces all regex matches inside the string with the replacement string you specified. No surprises here. All parts of the string that match the regex are replaced. You can use the contents of capturing parentheses in the replacement text via $1, $2, $3, etc. $0 (dollar zero) inserts the entire regex match. $12 is replaced with the 12th backreference if it exists, or with the 1st backreference followed by the literal “2” if there are less than 12 backreferences. If there are 12 or more backreferences, it is not possible to insert the first backreference immediately followed by the literal “2” in the replacement text.

In the replacement text, a dollar sign not followed by a digit causes an IllegalArgumentException to be thrown. If there are less than 9 backreferences, a dollar sign followed by a digit greater than the number of backreferences throws an IndexOutOfBoundsException. So be careful if the replacement string is a user-specified string. To insert a dollar sign as literal text, use \$ in the replacement text. When coding the replacement text as a literal string in your source code, remember that the backslash itself must be escaped too: “\\$”.

Java, JTDS, PreparedStatement and varchar

I’ve been working on an interesting application at work that needs to be fast, the faster the better in fact. I wrote a couple quick and dirty implementations in my scratchpad in Eclipse and I figured that I could get about fifty operations per second (a database UPDATE is involved for every operation among other things). Anyway, I went to develop a full implementation and a then ran a full test of about 100,000 operations. Instead of taking about 30 minutes (100,000 operations / 50 per second = ~ 30 minutes) the operation took about 7 hours. I was getting about 4 operations per second throughput, which was obviously a huge disappointment. The pseudocode I wrote originally looked something like this:

Connection c = DriverManager.getConnection(cs);
String q = "UPDATE mytable SET x = 1 WHERE id = ?";
PreparedStatement p = c.prepareStatement(q);
for (int i=0; i

and it worked well. I made a single change during development: instead of using the ‘id ‘ column of the database table (a numeric 9 byte primary key and thus is the clustered index for the table) I used a 13 byte varchar column as the identifier which had a nonclustered index, my code looked like this:

Connection c = DriverManager.getConnection(cs);
String q = "UPDATE mytable SET x = 1 WHERE y = ?";
PreparedStatement p = c.prepareStatement(q);
for (int i=0; i

The nonclustered index performed just as well as the clustered index: in my testing an UPDATE statement using the varchar column as the constraint in the query worked just as fast as the primary key / clustered index, which makes sense because index seeks (which I learned about in my database design class this semester) on a 9 byte / 72 bit numeric value (because I used a precision of 19 digits) should be similar to index seeks on a 13 byte / 104 bit varchar column. So then I executed the finished program (not the test) and brought up SQL Profiler (a tool that ships with SQL Server that can debug, troubleshoot, monitor, and measure your application’s SQL statements and stored procedures). It quickly became clear what the problem was. Here’s the SQL created by the prepareStatement() method:

create proc #jtds000001 @P0 varchar(4000) as UPDATE mytable SET x = 1 WHERE y = @P0

and then the executeUpdate() method:

exec #jtds000001 N'005QDUKS1MG8K'

See the problem? The JTDS driver turned the 13 byte varchar column into a 4000 byte varchar column (the maximum number of bytes for a column) and then prefixed the parameter with ‘n’, which is used to identify Unicode data types. This substitution caused the query processor to ignore the index on ‘y’ and do an index scan instead of an index seek.

Here’s where is gets fun. Microsoft SQL Server uses a B-tree index structure (also on wikipedia), which is similar to a B+tree, except that search key values can only appear once in the tree. Objects are stored in SQL Server as a collection of 8KB pages and (because of the class I’ve been taking) I now know that you can compute the approximate number of disk IO’s for an index seek as:


where n is the number of keys per node and k is the number of search keys. So with one million search keys and 8KB pages in SQL Server, a index on a 13 byte key would create a tree with about 615 nodes (~8000 / 13 = ~615). Thus the index seek in my system was costing about log615/2(1000000) = 2.4 node accesses (one node access ~= one disk IO) versus an index scan (615 nodes @ 8KB each, figure that on average over time we’ll find the value in 615/2 so ~307 node accesses?) which is significantly longer and obviously the cause of the problem.

Moral of the story: watch out for char / varchar constraint parameters when using JTDS and a PreparedStatement. Also, indexes are A Good ThingTM.

Updated 12/04/2005: Brian Heineman (one of the maintainers of the JTDS project) points out that this is a feature, not a bug. He also points out that you can work around the issue by appending:


to your database connection string (I tested it out and it works just as advertised). Since the real issue is that JTDS can’t tell if the String instance I’m sending is Unicode or not and so defaults to a Unicode string, the other workaround would be to use the setBytes() method of the PreparedStatement and the use the byte[] representation of the String. From my example above:

p.setBytes(1, somearray[i].getBytes());

Ruby on Rails in the Java community

Couple of months ago I attended the No Fluff Just Stuff conference up in Framingham, I took a bunch of notes which I intented to post to this blog, but never got around to it. The conference tag line is “The best value in the Java / Open Source conferencing space hands down” and I’d have to agree, although the emphasis on Ruby on Rails was surprising. Turns out that a number of the speakers who make their living consulting and writing books about Java have taken up Ruby on Rails and so maybe 25% of the sessions were about Ruby on Rails (the session by Dave Thomas was maybe one of the best conference sessions I’ve ever been too). I guess all this is to say that it’s not a surprise that the next ACM WebTech group meeting in Waltham is going to be about Ruby on Rails.

Update to embedded Axis application in Tomcat

I got a great email from Tamás in response to my last post who pointed out that the straight copy of deploy.wsdd to server-config.wsdd doesn’t cut it. More importantly, he mentioned that there is a utility that ships with Axis that allows you to generate server-config.wsdd from your deploy.wsdd (or from multiple deploy.wsdd if you have multiple web service end points). From the command line it looks like this:

> java -cp axis.jar;jaxrpc.jar;commons-logging.jar;commons-discovery.jar;saaj.jar;
org.apache.axis.utils.Admin server dir1\deploy.wsdd dir2\deploy.wsdd

But if you’re using the Ant build.xml I provided in the previous example, you’d use this:

   <arg value="server" />
    <arg file="${basedir}\deploy.wsdd" />

I updated the source code example (, you can download it here.

NOTE: The source code for the Admin class is available here, where you can see (but the documentation doesn’t mention) that the Admin class accepts multiple WSDD files from the command line.

Embeddding an Apache Axis application in Tomcat

One of the applications I’ve been heading up at my 8-to-5 needed a SOAP API that fronted a Java application deployed on Tomcat. If you’ve spent any time with Axis you know that it’s not the simplest thing to deal with; in fact it’s downright complex if you want do anything more than the simplest thing using SOAP. The simplest? Write your source code, rename the .java file to have a .jws extension and then copy the file into a public directory inside your servlet container (see ‘Deploy a Java Class as a Web Service’). Easy. But this method leaves alot to be desired: you can’t use packages in the source code and the code is compiled at runtime which means you don’t find out about compilation errors until after deployment (one way around this would be to first compile the .java file to make sure that it works and then use Ant to copy / rename the .java file to a .jws) and you can’t specify custom type mappings, among other things. Lucky for you, the jws method isn’t the only way you can do it.

The next two options give you flexibility with the additional cost of complexity. The first method is well covered in the documentation: Axis comes packaged with a web application that you can deploy to your servlet container and then add your custom services using a remote administration client also provided with the war file that you deploy. The downside (at least in my environment) is that this means you now have to maintain and deploy two separate applications: my Java based web application would be deployed to Tomcat and then the same business logic would be deployed to the Axis engine running inside of Tomcat. I felt that it would be simpler to maintain to instead deploy the web application and the SOAP application together as one application in one war file. That of course, is not well documented (in fact other than a PDF file that’s part of the ‘Java Development with Ant‘ written by Erik Hatcher, there is no mention of deploying your web application alongside an Axis application without using the Axis AdminClient). Hence the article you’re reading now.

step 1: setup application / dependent libraries: I’ll make the assumption that you already have a web application that you want to expose SOAP webservices with which means you probably have a file structure that looks like something like this:


You’ll need to add the following libraries to your WEB-INF/lib directory:
* axis.jar
* axis-ant.jar
* jaxrpc.jar
* wsdl4j.jar
* commons-logging.jar
* commons-discovery.jar
* saaj.jar

If you don’t, you can download the sample application I wrote that exposes a single hello world webservice.

Alright, so either you’ve downloaded the sample application or you’ve got your own application and you’ve added the above libraries to your WEB-INF/lib directory. Now you’re ready to write some script to expose your existing classes.

step 2: use WSDL2Java / Java2WSDL to generate the server side wrapper & deployment descriptors for the classes you want to expose. Write a interface:

package net.cephas.soap;
public interface HelloWorld extends java.rmi.Remote {
  public java.lang.String sayHello(java.lang.String in0)
    throws java.rmi.RemoteException;

and then a corresponding implementation which should be named $InterfaceName + SoapBindingImpl

package net.cephas.soap;
import java.rmi.RemoteException;
public class HelloworldSoapBindingImpl implements HelloWorld {
  public String sayHello(String name) throws RemoteException {
    return "hello " + name;

Next, you can use Ant (but you could easily run this from the command line as well) to create the WSDL and generate the server side wrapper & deployment descriptors. You’ll need to define two tasks in Ant:

<taskdef name="axis-java2wsdl" classname="">
  <classpath refid="compile.classpath" />
<taskdef name="axis-wsdl2java" classname="">
  <classpath refid="compile.classpath" />

and then you can generate the WSDL:


and generate the server side wrappers:


Copy the generated web service deployment descriptor (deploy.wsdd) file to WEB-INF/server-config.wsdd:

<copy file="generated/net/cephas/soap/deploy.wsdd"
  tofile="WEB-INF/server-config.wsdd" />

and the * file to your source tree:

<copy todir="src/net/cephas/soap" includeEmptyDirs="no">
  <fileset dir="generated/net/cephas/soap">
    <include name="*" />

Finally, compile your source and either jar it up to the WEB-INF/lib directory or deploy the compiled classes to the WEB-INF/classes directory.

step 3: configure web.xml with the appropriate servlet mappings. The last thing you need to do is to map a resource path in your application to Axis, you accomplish this by adding a servlet and servlet-mapping element to your web.xml:

  <display-name>Apache-Axis Servlet</display-name>

You can see that I’ve selected ‘/soap/’ as the resource path which in combination with the server-config.wsdd means that I’ll invoke the SOAP web service using a URL like this:


where embeddedaxis is the name I’ve the sample application. It will probably be different for your application.

Phew! That’s an awful lot of scripting and configuration to deploy hello world, but at least now you don’t have to rely on .jws files or deploying your application separately from the SOAP service. If you’re trying to do the same thing and something you read above doesn’t make sense, ping me.


Java, Collections and Multimap

I was in an interview recently and was asked a question which I thought the interviewer called an ‘atagram’, but I think it was actually an anagram. He asked how you could find the largest word in a dictionary where subtracting one letter results in another word (ie: ‘beat’ minus ‘e’ could be ‘tab’). I didn’t come up with an suitable answer during the interview, but during some unrelated reading this week I came across the question and the answer: multimap. Scroll down to the bottom of this page and start reading when you get to multimaps. The trick is that a multimap allows one key to map to multiple values and by alphabetizing each word in the dictionary and then placing the word in the map keyed by the alphabetized word, you can easily find all the available words which result from word minus letter.

XML characters, smart quotes and Apache XML-RPC

I’ve been eating my own dogfood with the deliciousposter project (as you can see from my daily links). A couple days ago I posted a some links to and expected them to show up automatically the next day… except they didn’t. I traced it down to an errant smart quote that I copied from the Internet Alchemy Talis, Web 2.0 and All That post, which caused the Apache XML-RPC library to throw this error: Invalid character data corresponding to XML entity ’

I worked under the assumption that the smart quote was an invalid XML character for quite awhile, but it looks like it actually is according to the XML 1.1 specification, the following characters are allowed in an XML document:

#x9 | #xA | #xD | [#x20-#xD7FF] | [#xE000-#xFFFD] | [#x10000-#x10FFFF]

I then checked the source code for the XmlWriter which has this method for writing character data:

if (c < 0x20 || c > 0xff) {
  // Though the XML-RPC spec allows any ASCII
  // characters except '<' and '&', the XML spec
  // does not allow this range of characters,
  // resulting in a parse error from most XML
  // parsers.
  throw new XmlRpcClientException("Invalid character data " +
  "corresponding to XML entity &#" +
  String.valueOf((int) c) + ';', null);
} else ..

which turns out to be a tad aggressive. It also turns out that the above code snippet and the version of the Apache XML-RPC library I was using are out of date. The chardata(String text) has been updated in the latest version of the Apache XMl-RPC library to include a new method called isValidXMLChar(char c) which is much more lenient:

if (c == '\n') return true;
if (c == '\r') return true;
if (c == '\t') return true;
if (c
and not coincidentally, is compliant with the specification.

I'll be updating deliciousposter to use the latest version of the Apache XML-RPC library soon. In the meantime, if you're using the Apache XML-RPC library, you should probably download the latest version to take advantage of the new XML character validation method.

eBay Java / C# SOAP Examples

At the start of this year I worked with some guys at eBay to further develop their code samples. Some, but not all of the twelve examples I wrote went live in the recently launched Community Codebase. You can download all the Java examples (of which I wrote three) or browse the Subversion repository. I wrote the SOAPAddItem, SOAPGetItem, and SOAPGetUser items in the Java source tree.

The examples I wrote were different from the majority of the (then) existing examples in that I didn’t make any IDE assumptions (most of the Java examples have JBuilder .jpx files and Eclipse .project files, the .NET projects contain the Visual Studio artifacts.. tsk tsk.) and as such, all the examples I wrote contain comprehensive Ant / NAnt build files which means you can get up and running without having to setup your fancy schmancy IDE. But the biggest difference was that all the examples I wrote used either Ant (with WSDL2Java) or NAnt (with NAntContrib) tasks to conditionally download the eBay WSDL, generate the client stub(s), and compile the resulting code, which makes for a prettier source code repository (generated stubs aren’t checked into source) and gives you compile time checking of your code against the API.

If you’re interested in how you can use Ant or NAnt in a build environment where you access SOAP services, you should check it out!