Java, JTDS, PreparedStatement and varchar

I’ve been working on an interesting application at work that needs to be fast, the faster the better in fact. I wrote a couple quick and dirty implementations in my scratchpad in Eclipse and I figured that I could get about fifty operations per second (a database UPDATE is involved for every operation among other things). Anyway, I went to develop a full implementation and a then ran a full test of about 100,000 operations. Instead of taking about 30 minutes (100,000 operations / 50 per second = ~ 30 minutes) the operation took about 7 hours. I was getting about 4 operations per second throughput, which was obviously a huge disappointment. The pseudocode I wrote originally looked something like this:

Connection c = DriverManager.getConnection(cs);
String q = "UPDATE mytable SET x = 1 WHERE id = ?";
PreparedStatement p = c.prepareStatement(q);
for (int i=0; i

and it worked well. I made a single change during development: instead of using the ‘id ‘ column of the database table (a numeric 9 byte primary key and thus is the clustered index for the table) I used a 13 byte varchar column as the identifier which had a nonclustered index, my code looked like this:

Connection c = DriverManager.getConnection(cs);
String q = "UPDATE mytable SET x = 1 WHERE y = ?";
PreparedStatement p = c.prepareStatement(q);
for (int i=0; i

The nonclustered index performed just as well as the clustered index: in my testing an UPDATE statement using the varchar column as the constraint in the query worked just as fast as the primary key / clustered index, which makes sense because index seeks (which I learned about in my database design class this semester) on a 9 byte / 72 bit numeric value (because I used a precision of 19 digits) should be similar to index seeks on a 13 byte / 104 bit varchar column. So then I executed the finished program (not the test) and brought up SQL Profiler (a tool that ships with SQL Server that can debug, troubleshoot, monitor, and measure your application’s SQL statements and stored procedures). It quickly became clear what the problem was. Here’s the SQL created by the prepareStatement() method:

create proc #jtds000001 @P0 varchar(4000) as UPDATE mytable SET x = 1 WHERE y = @P0

and then the executeUpdate() method:

exec #jtds000001 N'005QDUKS1MG8K'

See the problem? The JTDS driver turned the 13 byte varchar column into a 4000 byte varchar column (the maximum number of bytes for a column) and then prefixed the parameter with ‘n’, which is used to identify Unicode data types. This substitution caused the query processor to ignore the index on ‘y’ and do an index scan instead of an index seek.

Here’s where is gets fun. Microsoft SQL Server uses a B-tree index structure (also on wikipedia), which is similar to a B+tree, except that search key values can only appear once in the tree. Objects are stored in SQL Server as a collection of 8KB pages and (because of the class I’ve been taking) I now know that you can compute the approximate number of disk IO’s for an index seek as:

logn/2(k)

where n is the number of keys per node and k is the number of search keys. So with one million search keys and 8KB pages in SQL Server, a index on a 13 byte key would create a tree with about 615 nodes (~8000 / 13 = ~615). Thus the index seek in my system was costing about log615/2(1000000) = 2.4 node accesses (one node access ~= one disk IO) versus an index scan (615 nodes @ 8KB each, figure that on average over time we’ll find the value in 615/2 so ~307 node accesses?) which is significantly longer and obviously the cause of the problem.

Moral of the story: watch out for char / varchar constraint parameters when using JTDS and a PreparedStatement. Also, indexes are A Good ThingTM.

Updated 12/04/2005: Brian Heineman (one of the maintainers of the JTDS project) points out that this is a feature, not a bug. He also points out that you can work around the issue by appending:

sendStringParametersAsUnicode=false;

to your database connection string (I tested it out and it works just as advertised). Since the real issue is that JTDS can’t tell if the String instance I’m sending is Unicode or not and so defaults to a Unicode string, the other workaround would be to use the setBytes() method of the PreparedStatement and the use the byte[] representation of the String. From my example above:

p.setBytes(1, somearray[i].getBytes());

This entry was posted in J2EE, Software Development. Bookmark the permalink.

6 Responses to Java, JTDS, PreparedStatement and varchar

  1. lhotari says:

    Nice description of the problem. I’ve filed a feature request in the jtds tracker for fixing this issue.
    http://sourceforge.net/tracker/index.php?func=detail&aid=1372058&group_id=33291&atid=407765

  2. Most SQL Server JDBC drivers have a parameter for controlling how character data is sent to the database. I recommend reading the documentation for each driver as you will learn what options are available for performance tuning. For instance, you can change the default behavior of the jTDS driver ( and several other drivers) using sendStringParametersAsUnicode=false.

    From the jTDS FAQ (http://jtds.sourceforge.net/faq.html):

    sendStringParametersAsUnicode (default – true)
    Determines whether string parameters are sent to the SQL Server database in Unicode or in the default character encoding of the database. This seriously affects SQL Server 2000 performance since it does not automatically cast the types (as 7.0 does), meaning that if a index column is Unicode and the string is submitted using the default character encoding (or the other way around) SQLServer will perform an index scan instead of an index seek. For Sybase, determines if strings that cannot be encoded in the server’s charset are sent as unicode strings. There is a performance hit for the encoding logic so set this option to false if unitext or univarchar data types are not in use or if charset is utf-8.

    Cheers,
    -Brian

  3. You might also consider using batch updates to speed up things. I realize that it could be difficult in most situations and with existing designs but with the amount of data you’re dealing it could make a huge difference.

    The idea with batch updates is that performance would be less affected by network latency.

    Alin.

  4. Joe Dunleavy says:

    Can you post the sql profiler output of when you you set
    sendStringParametersAsUnicode=FALSE?
    I’d like to compare the create proc statements.

  5. I dont understand “if the string instance is unicode or not”, since Strings are always in UTF-16 in java. Does it mean “if the string contains characters outside of the intended non-unicode codepage”? In that case it could check that by converting it.

  6. Lắm says:

    thank you for your help ! this article ‘s very useful .

Leave a Reply

Your email address will not be published. Required fields are marked *

*

You may use these HTML tags and attributes: <a href="" title=""> <abbr title=""> <acronym title=""> <b> <blockquote cite=""> <cite> <code> <del datetime=""> <em> <i> <q cite=""> <strike> <strong>