All posts by ajohnson

Mintz Levin dot com launch

So after about 12 hours of load testing and final preparations, I’m proud to say that the 2003 version of the Mintz Levin site is now up and running. The site uses web services for publishing information from the staging site to the live site, isapi_rewrite to provide search engine safe urls and is probably the latest revision available of the MINDSEYE CMS framework we’ve been developing internally for the last couple months.

If you read my previous posts about full text searching and the speed (or lack thereof) of Verity, you’ll understand why today we actually ripped out the Verity searching and replaced it with the Full Text Searching engine available in SQL Server. A couple things lead to that decision.

First, I couldn’t get anything faster than 250ms per Verity search on a (admittedly slow) single processor 500mz Pentium III. Since the site wide search page allows you to search up to 9 collections (and filter down within those sections for a more fine tuned search), the search page was taking upwards of 2 seconds to execute, which caused the machine get overwhelmed when doing anything more than a couple simultaneous users. Performance monitor would show the processor pegged at 100%, cfstat would show requests being queued and the average request time rapidly growing. After replacing the engine, we got the request time down to (in some cases) 16ms and cf didn’t spike the processor. In fact, we loaded the box up to 150 concurrent users (using Apache JMeter which is a great tool for doing cheap load testing, check it out sometime) with no issues (mind you, it’s a single processor 500mhz) . Not bad.

Second, SQL Server Full Text searching gave us more flexibility in searching; if I wanted to order the searches by datecreated or by something other than label or relevance, we simply added an ‘order by’ to the sql query.

Finally, because the SQL Server Full Text engine runs within a cfquery, you can add the cachedwithin attribute to the query to get the results directly into RAM. Verity doesn’t have a cachedwithin attribute, the only way to cache the results would be to stick the resulting query into application or server scope, which by itself isn’t all that bad, but just requires more coding than simply adding an attribute to a tag.

The only time I might use Verity (ie: where SQL Server Full Text Searching would not work) would be where I needed to index the file system (ie: html files, pdf files, word files) or where I needed to spider a site. However, even here, if you have the time to write extra code, it might make sense to use a freely available spider or text extraction tool to read documents on the file system system or spider the site and then store the resulting information in SQL Server as text. SQL Server Full Text Searching is obviously tied to a MS/Windows environment but I haven’t yet looked at how MySQL handles full text searching, although I know that Maia has done some work with it on nehra.com. I’d be interested in hearing about anyone elses’s experience with full text searching using MS SQL, MySQL or Verity. If you’re reading this post and are having issues with Verity and you don’t have access to MySQL or SQL Server with full text indexing, you might look at writing a CFX that wraps Lucene (an idea that Joe commented on).

Update: David and Ray sent a couple links:

Building Search Applications for the Web Using Microsoft SQL Server 2000 Full-Text Search [source]

Determining Current Version of CFMX [source, source]

The Goal: A Process of Ongoing Improvement

A Process of Ongoing ImprovementI finished reading “The Goal: A Process of Ongoing Improvement” today (a sunny and windy beautiful day in Boston btw) which Joel pointed to a couple months back. Like Joel, I’ll find it useful should I ever need to run a factory, it also struck a chord with me on some non-programming level, more business like level. One of the takeaways from the book was the description of what the goal of a for-profit organization should be:

… so I can say that the goal is to increase net profit, while simultaneously increasing both ROI and cash flow and that’s the equivalent of saying the goal is to make money…. They’re measurements which express the goal of making money perfectly well, but which also permit you to develop operational rules for running your plant. Their names are throughput, inventory and operational expense… Throughput is the rate at which the system generates money through sales…. Inventory is all the money that the system has invested in purchasing things which it intends to sell…. Operational expense is all the money the system spends in order to turn inventory into throughput.” (pg 59-60) Selling physical widgets is a much different ballgame than selling services or selling software, so I’m having a hard time imagining how this might apply to a web shop like MINDSEYE, but it’s an interesting mental exercise nonetheless.

Another interesting concept (which I’m guessing is covered in the Critical Chain book) is the idea that in a multi-step synchronous process, the fluctuations in speed of the various steps result not in an averaging out of time the system needs to run but rather in an accumulation of the fluctuations. His example was a hike with 15 Boy Scouts, if you let one kid lead the pack and that kid averages between 1.0 and 2.0 mph, the distance between the first and the 15th scout will never average out but will gradually increase with time because “… dependency limits the opportunities for higher fluctuations.” (pg 100). In this case, the first scout doesn’t have any dependencies, but each of the next 14 do: the person in front of them. Thus, as the hike progresses, the other 14 can never go faster than the person in front of them and so on and so forth. This actually does have some usefulness within our daily jobs as programmers. Each of us does depend on someone in one way or another whether it be for design assets or IA docs or a software function. A breakdown in one step of a project can sometimes be made up through sheer willpower and alot of caffeine, but most likely will result in the project being delayed. The same thinking can be applied to software that we write: slow functions, rather the statistical fluctuations in functions (fast or slow) don’t average out as much as they accumulate, which I guess is kind of obvious, but worth noting.

Other nuggets to chew on this morning: “The maximum deviation of a preceding operation will become the starting point of a subsequent operation.” (pg 133)

On bottlenecks within an operation: “… Make sure the bottleneck works only on good parts by weeding out the ones that are defective. If you scrap a part before it reaches the bottleneck, all you have lost is a scrapped part. But if you scrap the part after it’s passed the bottleneck, you have lost time that cannot be recovered.” (pg 156) Do you have expensive (in terms of processor cycles) operations in a computer program after which you then scrub the data for quality? Why not put the scrub before the expensive operation saving yourself some cycles.

On problem solving: “… people would come to me from time to time with problems in mathematics they couldn’t solve. They wanted me to check their numbers for them. But after a while I learned not to waste my time checking the numbers — because numbers were almost always right. However, if I checked the assumptions, they were almost always wrong.” (pg 157) Next time someone asks you to review some code that isn’t producing the correct results, attack their assumptions before going over the code with a fine tooth comb.

BTW, the book is written as a story so if like reading stories and want to stretch the business side of your brain, but don’t enjoy business textbooks, this book is for you!

Update: Jeff Sutherland has a great review of the book from a technical perspective here.

new Macromedia.com

So the new Macromedia.com site went live yesterday, lots of reaction on the various lists I’m on to it. Notably, very few were positive, most were bashing the use of Flash or the speed or [insert pet peeve here]. How about a round of applause for pushing the envelope and drinking your own medicine? They’ve just converted an enormous site (really a combination of two distinct companies just a short time ago) into a massive rich internet application. Congratulations on a job well done! [via JD on MX]

using Verity and ColdFusion MX

Spent all day yesterday battling last minute emergencies for a site that’s about to launch. One of the things ‘problems’ was the speed, or actually the lack thereof, of full text searches using Verity and ColdFusion MX (documentation here). The site I’m working on has 9 full text collections (each one representing a different area of the site) and includes multiple forms allowing for full text search. One of these forms actually searches all 9 collections and returns results. In my testing, I couldn’t get any better than approximately 250ms per search, which meant that the search would take a mininum of 2.250 seconds, which isn’t acceptable at this point. I did optimize (multiple times) the Verity collections, many thanks to Geoff Bowers for this great post on ‘Verity Optimisation‘ (even if they do spell it wrong over there), as well as for his help offline. Geoff mentioned that my best bet would be to use K2, but it looks like we’re not going to be able to move to that before launch. Couple interesting things came out of an internal discussion about using K2 within CFMX (mainly from Ray Camden, thanks Ray!):

a) K2 reads in the collections at startup and apparently will not see updates to its’ collections until after restarted. Anyone have any official documentation that says that? (update: I do see this note: “Macromedia does not recommend running K2 Server as a Windows service. You must stop the service before you modify or delete collections registered with K2 Server. You must then remember to restart the service. You must also verify that the vdkHome information in your k2server.ini file is uncommented-that is, it has no leading pound (#) signs-and points to the correct location of the common directory.“)

b) because of a. above, you should probably restart K2 using cfexecute after the collection has been updated or at a set time/schedule using a Scheduled Task

c) because of b. above, there is a handy function ‘IsK2ServerOnline()‘ that you can use to determine whether or not the server is online, and if not, gracefully handle the error condition.