{"id":730,"date":"2005-12-02T14:27:39","date_gmt":"2005-12-02T18:27:39","guid":{"rendered":"http:\/\/wordpress.cephas.net\/?p=730"},"modified":"2014-04-23T05:29:31","modified_gmt":"2014-04-23T13:29:31","slug":"java-jtds-preparedstatement-and-varchar","status":"publish","type":"post","link":"https:\/\/cephas.net\/blog\/2005\/12\/02\/java-jtds-preparedstatement-and-varchar\/","title":{"rendered":"Java, JTDS, PreparedStatement and varchar"},"content":{"rendered":"<p>I&#8217;ve been working on an interesting application at work that needs to be fast, the faster the better in fact. I wrote a couple quick and dirty implementations in my scratchpad in Eclipse and I figured that I could get about fifty operations per second (a database UPDATE is involved for every operation among other things). Anyway, I went to develop a full implementation and a then ran a full test of about 100,000 operations.  Instead of taking about 30 minutes (100,000 operations \/ 50 per second = ~ 30 minutes) the operation took about 7 hours. I was getting about 4 operations per second throughput, which was obviously a huge disappointment. The pseudocode I wrote originally looked something like  this:<br \/>\n<code><br \/>\nConnection c = DriverManager.getConnection(cs);<br \/>\nString q = \"UPDATE mytable SET x = 1 WHERE id = ?\";<br \/>\nPreparedStatement p = c.prepareStatement(q);<br \/>\nfor (int i=0; i<br \/>\n<\/code><br \/>\nand it worked well.  I made a single change during development: instead of using the &#8216;id &#8216; column of the database table (a numeric 9 byte primary key and thus is the clustered index for the table) I used a 13 byte varchar column as the identifier which had a nonclustered index, my code looked like this:<br \/>\n<code><br \/>\nConnection c = DriverManager.getConnection(cs);<br \/>\nString q = \"UPDATE mytable SET x = 1 WHERE y = ?\";<br \/>\nPreparedStatement p = c.prepareStatement(q);<br \/>\nfor (int i=0; i<br \/>\n<\/code><br \/>\nThe nonclustered index performed just as well as the clustered index: in my testing an UPDATE statement using the varchar column as the constraint in the query worked just as fast as the primary key \/ clustered index, which makes sense because index seeks (which I learned about in my <a href=\"http:\/\/moosehead.cis.umassd.edu\/cis552\">database design<\/a> class this semester) on a 9 byte \/ 72 bit numeric value (because I used a <a href=\"http:\/\/msdn.microsoft.com\/library\/default.asp?url=\/library\/en-us\/tsqlref\/ts_de-dz_3grn.asp\">precision of 19 digits<\/a>) should be similar to index seeks on a 13 byte \/ 104 bit varchar column.  So then I executed the finished program (not the test) and brought up <a href=\"http:\/\/msdn.microsoft.com\/library\/default.asp?url=\/library\/en-us\/dnpag\/html\/scalenethowto15.asp\">SQL Profiler<\/a> (a tool that ships with SQL Server that can debug, troubleshoot, monitor, and measure your application&#8217;s SQL statements and stored procedures). It quickly became clear what the problem was. Here&#8217;s the SQL created by the prepareStatement() method:<br \/>\n<code><br \/>\ncreate proc #jtds000001 @P0 varchar(4000) as UPDATE mytable SET x = 1 WHERE y = @P0<br \/>\n<\/code><br \/>\nand then the <code>executeUpdate()<\/code> method:<br \/>\n<code><br \/>\nexec #jtds000001 N'005QDUKS1MG8K'<br \/>\n<\/code><br \/>\nSee the problem? The <a href=\"http:\/\/jtds.sourceforge.net\/\">JTDS driver<\/a> turned the 13 byte varchar column into a 4000 byte varchar column (the maximum number of bytes for a column) and then prefixed the parameter with &#8216;n&#8217;, which is <a href=\"http:\/\/msdn.microsoft.com\/library\/default.asp?url=\/library\/en-us\/createdb\/cm_8_des_04_6zp0.asp\">used to identify Unicode data types<\/a>. This substitution caused the query processor to ignore the index on &#8216;y&#8217; and do an index scan instead of an index seek.  <\/p>\n<p>Here&#8217;s where is gets fun.  Microsoft SQL Server uses a <a href=\"http:\/\/msdn.microsoft.com\/library\/default.asp?url=\/library\/en-us\/architec\/8_ar_da2_8sit.asp\">B-tree index structure<\/a> (also on <a href=\"http:\/\/en.wikipedia.org\/wiki\/B_tree\">wikipedia<\/a>), which is similar to a <a href=\"http:\/\/en.wikipedia.org\/wiki\/B_plus_tree\">B+tree<\/a>, except that search key values can only appear once in the tree. Objects are stored in SQL Server as a collection of 8KB pages and (because of the class I&#8217;ve been taking) I now know that you can compute the approximate number of disk IO&#8217;s for an index seek as:<br \/>\n<code><br \/>\nlog<sub>n\/2<\/sub>(k)<br \/>\n<\/code><br \/>\nwhere n is the number of keys per node and k is the number of search keys. So with one million search keys and 8KB pages in SQL Server, a index on a 13 byte key would create a tree with about 615 nodes (~8000 \/ 13 = ~615). Thus the index seek in my system was costing about log<sub>615\/2<\/sub>(1000000) = 2.4 node accesses (one node access ~= one disk IO) versus an index scan (615 nodes @ 8KB each, figure that on average over time we&#8217;ll find the value in 615\/2 so ~307 node accesses?) which is significantly longer and obviously the cause of the problem.<\/p>\n<p>Moral of the story: watch out for char \/ varchar constraint parameters when using JTDS and a PreparedStatement. Also, indexes are A Good Thing<sup>TM<\/sup>.<\/p>\n<p><font color=\"red\">Updated 12\/04\/2005: <a href=\"http:\/\/sourceforge.net\/users\/bheineman\/\">Brian Heineman<\/a> (one of the maintainers of the JTDS project) points out that this is a feature, not a bug. He also points out that you can work around the issue by appending:<br \/>\n<code><br \/>\nsendStringParametersAsUnicode=false;<br \/>\n<\/code><br \/>\nto your database connection string (I tested it out and it works just as advertised).  Since the real issue is that JTDS can&#8217;t tell if the String instance I&#8217;m sending is Unicode or not and so defaults to a Unicode string, the other workaround would be to use the setBytes() method of the PreparedStatement and the use the byte[] representation of the String. From my example above:<br \/>\n<code><br \/>\np.setBytes(1, somearray[i].getBytes());<br \/>\n<\/code><br \/>\n<\/font><\/p>\n","protected":false},"excerpt":{"rendered":"<p>I&#8217;ve been working on an interesting application at work that needs to be fast, the faster the better in fact. I wrote a couple quick and dirty implementations in my scratchpad in Eclipse and I figured that I could get about fifty operations per second (a database UPDATE is involved for every operation among other &hellip; <a href=\"https:\/\/cephas.net\/blog\/2005\/12\/02\/java-jtds-preparedstatement-and-varchar\/\" class=\"more-link\">Continue reading <span class=\"screen-reader-text\">Java, JTDS, PreparedStatement and varchar<\/span> <span class=\"meta-nav\">&rarr;<\/span><\/a><\/p>\n","protected":false},"author":1,"featured_media":0,"comment_status":"open","ping_status":"closed","sticky":false,"template":"","format":"standard","meta":[],"categories":[3,2],"tags":[],"_links":{"self":[{"href":"https:\/\/cephas.net\/blog\/wp-json\/wp\/v2\/posts\/730"}],"collection":[{"href":"https:\/\/cephas.net\/blog\/wp-json\/wp\/v2\/posts"}],"about":[{"href":"https:\/\/cephas.net\/blog\/wp-json\/wp\/v2\/types\/post"}],"author":[{"embeddable":true,"href":"https:\/\/cephas.net\/blog\/wp-json\/wp\/v2\/users\/1"}],"replies":[{"embeddable":true,"href":"https:\/\/cephas.net\/blog\/wp-json\/wp\/v2\/comments?post=730"}],"version-history":[{"count":1,"href":"https:\/\/cephas.net\/blog\/wp-json\/wp\/v2\/posts\/730\/revisions"}],"predecessor-version":[{"id":1841,"href":"https:\/\/cephas.net\/blog\/wp-json\/wp\/v2\/posts\/730\/revisions\/1841"}],"wp:attachment":[{"href":"https:\/\/cephas.net\/blog\/wp-json\/wp\/v2\/media?parent=730"}],"wp:term":[{"taxonomy":"category","embeddable":true,"href":"https:\/\/cephas.net\/blog\/wp-json\/wp\/v2\/categories?post=730"},{"taxonomy":"post_tag","embeddable":true,"href":"https:\/\/cephas.net\/blog\/wp-json\/wp\/v2\/tags?post=730"}],"curies":[{"name":"wp","href":"https:\/\/api.w.org\/{rel}","templated":true}]}}