All posts by ajohnson

more books…

Currently in Mammoth hanging at my parents house where we’re in the midst of a[nother] snowstorm. This one could dump as much as 2 feet!

So far I’ve finished 3 books on this trip: Bots: The Origin of New Species by Andrew Leonard, The Future of Ideas by Lawrence Lessig, and Fast Food Nation by Eric Schlosser. All great books, I’ve got a ton of dog eared pages for each book that I’d like to blog… but using the computer is off limits on vacation, right?

Software Development Tools

I’m guessing that in every trade or craft, the ‘craftsmen’ delight in sharing (or not sharing!) their favorite tools, the objects that make their day. For cooks, maybe it’s a special pot or pan; for writers, maybe it’s a special pen. For software developers, well.. we just have software. Scott mentions a list of tools he used in a recent demonstration at Microsoft, most of which were new (and very interesting) to me. I’m pasting below so that I don’t have to check out his page every time I want to remember where something was:

Gödel, Escher, Bach: An Eternal Golden Braid

more from “Gödel, Escher, Bach: An Eternal Golden Braid“:

information-revealers — devices like a record or CD player which can pull information out of an information-bearer

information-bearer — a device that holds information…

Those words aren’t very exciting until you start thinking about stuff like this:

“Take the case of the genetic information commonly said to reside in the double helix of deoxyribonucleic acid (DNA). A molecule of DNS — a genotype — is converted into a physical organism — a phenotype — by a very complex process, involving the manufacture of proteins, the replication of the DNA, the replication of cells, the gradual differentiation of cell types, and so on. Incidentally, this unrolling of phenotype from genotype — epigenesis — is the most tangled of tangled recursions… Epigenesis is guided by a set of enormously complex cycles of chemical reactions and feedback loops. By the time the full organism has been constructed, there is not even the remotest simliarity between its physical characteristecs and its genotype. … And yet, it is standard practice to attribute the physical structure of the organism to the structure of its DNA, and to that alone. The first evidence for this point of view came from experiments conducted by Oswald Avery in 1946, and overwhelming corroborative evidence has since been amassed. Avery’s experiments showed that, of all the biological molecules, only DNA transmits hereditary properties. One can modify other molecules in an organism, such as proteins, but such modifications will not be transmitted to later generations. However, when DNA is modified, all successive generations inherit the modified DNA. Such experiments show that the only way of changing the instructions for building a new organism is to change the DNA — and this, in turn, implies that those instructions must be coded somehow in the structure of DNA.” [pg 158]

Still with Douglas? Continuing on with the next paragraph: “Therefor one seems forced into accepting the idea that the DNA’s structure contains the information of the phenotype’s structure, which is to say, the two are isomorphic. However, the isomorphism is an exotic one, by which I mean that it is highly nontrivial to divide the phenotype and the genotype into “parts” which can be mapped onto each other. Prosaic isomorphisms by contrast, would be ones into which the parts of one structure are easily mappable onto the parts of the other. An example is the isomorphism between a record and a pice of music, where one knows that to any sound in the piece there exists an exact “image” in the patterns etched into the grooves, and one could pinpoint it arbitrarily accurately, if the need arose… The isomorphism between DNA structure and phenotype structure is anything but prosaic, and the mechanism which carries it out physically is awesomely complicated. For instance, if you wanted to find some piece of your DNA which accounts for the shape of your nose or the shape of your fingerprint, you would have a very hard time. It would be a little like trying to pin down the note in a piece of music which is the carrier of the emotional meaning of the piece. Of course there is no such note, because the emotional meaning is carried on a very high level, by large “chunks” of the piece, not by single notes. Incidentally, such “chunks” are not necessarily sets of contiguous notes; there may be disconnected sections which, taken together, carry some emotional meaning.” [pg 160]

Page 168 has a fascinating collage of various scripts including Mongolian and Buginese. It boggles my mind that the words I’m writing right now using the English alphabet mean absolutely nothing to people in other parts of the world.

All for tonight…

Verity Spider tips & tricks

Thanks to Phil for sending me a link to the Verity Spider tips & tricks on daemon.com.au. Daemon is/was a big Spectra shop and probably used the spider to search Spectra sites on a regular basis. So why doesn’t that page show up in a google search for “verity spider” or “verity spider tips“? Maybe it’s because of the way their content management system works, where each page is denoted by a CF UUID appended to the URL. This method probably helps the developers, but in the long run, isn’t so good for getting ranked or even indexed by the larger search engines… which led me to todays’ research: mod_rewrite. I got my MCSE from Microsoft back a couple years ago, so my first exposure to web servers was IIS. IIS was then and for the most part, is now very pointy clicky (although I’ve heard that .NET IIS will have a text-based configuration file). Anyway, Apache wasn’t something I played with much until the last year, when I brought up a couple linux machines and thus Apache. So today I dove headfirst into mod_rewrite and came up a solution for making the next version (due out anyday now) of karensrecipes.com more search engine friendly. In short, to get to a recipe on the development site right now, you’d type in something like this:

http://www.karensrecipes.com/recipes/detail.jsp?r=18

Again, just like the link I mentioned above, this is not an example of how to impress the search engines. Some kung foo regular expressions and a dab of JKMount knowledge and we now get something like this:

http://www.karensrecipes.com/recipes/18/Steamed_Mussels.jsp

and in your Apache httpd.conf:

RewriteEngine on
RewriteRule ^/recipes/([0-9]+)/.*$ /recipes/detail.jsp?r=$1 [PT]

which in English says something like “if the request starts with ‘/recipe/’ and then is followed by any number of digits and then is followed by a ‘/’ and any number of other characters, then rewrite the URL to this… (wanna know more about regular expressions? get this fabulous book!)

Pretty snazzy eh? It gives me warm feelings inside because my JSP/Servlet code doesn’t have any knowledge that funny stuff is being done to the URL in Apache, which means you can do all sorts of chicanery to your URL without having to change a lick of server side code.

Search-Enable Your Application with Lucene

Reading this month’s Java Developers Journal while exercising today, specifically, the article titled “Search-Enable Your Application with Lucene“. Back a couple months ago when I first added Lucene searching to this site, I thought it would have been a great feature to be able to index a URL. So, for example, when creating and updating an index of files in directory on the file system, you’d do something like this:

IndexWriter writer = new IndexWriter(“index”, new StandardAnalyzer(), true);
File file = new File(“c:\htmlToIndex”);
String[] files = file.list();
for (int i = 0; i Verity Spidering. Very nice! So I guess the same code I mentioned above could be done from the command line like so:

c:\cfusionmx\lib\_nti40\bin\vspider -common c:\cfusionmx\lib\common -collection c:\new -start http://www.mysite.com/products/? -indinclude *

But one of the advantages that Lucene has over a product like Verity is the ability one has to customize indexing and searching routines. For instance, one of the examples the author(Craig Walls) gave was the ability to add synonym-matching capability in your indexing routine. Basically, in Lucene, if you want add synonyms to keywords, you subclass TokenFilter, by writing a short bit of code (he provided an example in the source code) and you’re done. To the best of my knowledge, you can’t do that with Verity. Correction: you can’t “extend” Verity… but it comes with a simliar feature to the above mentioned ‘synonym’ feature called “THESAURUS” (“Expands the search to include the word that you enter and its synonyms”). I’ve not spent much time with Verity, but the evidence operators on the CFMX docs page are really intriguing, specifically the “THESAURUS”, “SOUNDEX” and “TYPO/N” evidence operators.