Dangerious Ideas Seminar: Why I hate programming

If you’re in the New England area next week, you should check out the “Why I hate programming” seminar at MIT. The abstract sounds interesting:

Over the past thirty years a host of new ideas about programming have
emerged from this building, yet the average engineer has seen little
change in the drudgery of day to day programming. Why is it that have
we not seen large-scale improvements in our programming environments
and methodology? To answer this question I will share a few lessons
and trends picked up from industry and the implications I think these
have for the future.

I will argue that, in part, we have not been solving the right
problems. Far too little of the techniques learned in the pursuit of
AI and the advancement of computer science are employed in our
programming environments and these environments are of too limited
scope. I argue that visibility into behavior is more important than
specific language semantics. I illustrate why testing is more
fundamental to good programming than coding. I explore why the
ambiguity in most projects is actually backwards; typically found in
the design specification and not in the implementation. I’ll propose
that languages hurt abstraction and reuse by requiring programs to be
too specific and introduce some ideas on how to avoid this.

This talk addresses the fundamental question of how to make Moore’s
law work for programmers as well as users: enabling software to be
faster to create, easier to evolve and more robust to run.

More information can be found at the Dangerous Ideas site.

Shallow comparison of ASP.NET to Flex

Flex seems pretty interesting when you realize how similar it is to something like ASP.NET. Look how similar this snippet of Flex:

<?xml version="1.0" encoding="iso-8859-1"?>
<mx:Application xmlns:mx="http://www.macromedia.com/2003/mxml">
   <mx:script>
   function copy() {
      destination.text=source.text
      }
   </mx:script>
   <mx:TextInput id="source" width="100"/>
   <mx:Button label="Copy" click="copy()"/>
   <mx:TextInput id="destination" width="100"/>
</mx:Application>

to this snippet of ASP.NET code:

<script language="C#" runat="server">
   public void Copy(Object src, EventArgs e) {
      destination.Text = source.Text;
   }
</script>
<form runat="server">
<asp:textbox id="source" size="30" runat="server" />
<asp:button id="copy" OnClick="Copy" text="Copy" runat="server" />
<asp:textbox id="destination" size="30" runat="server" />
</form>

I can’t wait to see the IDE rolled out for Eclipse and the .NET version. Cool stuff Macromedia!

QueryParser … in NLucene

Misleading title. I implemented the first of the examples that Erik Hatcher used in his
article about the Lucene QueryParser
, only I used NLucene. Lucene and NLucene are very similar, so if anything, it’s interesting only because it highlights a couple of the differences between C# and Java.

First, here’s the Java example taken directly from Erik’s article:

public static void search(File indexDir, String q) {
  Directory fsDir = FSDirectory.getDirectory(indexDir, false);
  IndexSearcher is = new IndexSearcher(fsDir);
  Query query = QueryParser.parse(q, "contents", new StandardAnalyzer());
  Hits hits = is.search(query);
  System.out.println("Found " hits.length() +
    " document(s) that matched query '" q "':");
  for (int i = 0; i
The NLucene version looks eerily similar:

public static void Search(DirectoryInfo indexDir, string q) {
  DotnetPark.NLucene.Store.Directory fsDir = FsDirectory.GetDirectory(indexDir, false);
  IndexSearcher searcher = new IndexSearcher(fsDir);
  Query query = QueryParser.Parse(q, "contents", new StandardAnalyzer());
  Hits hits = searcher.Search(query);
  Console.WriteLine("Found " + hits.Length +
    " document(s) that matched query '" + q + "':");
  for (int i = 0; i
The differences are mainly syntax.

First, Erik used the variable name 'is' for his IndexSearcher. In C# 'is' is a keyword, so I switched the variable name to 'searcher'. If you're really geeky, you might want to brush up on all the Java keywords and the C# keywords.

Second, while Java uses the File class to describe directories and files, the .NET Framework uses the DirectoryInfo class.

Third, Java programmers are encouraged to capitalize class names and use camel Case notation for method and variable names while C# programmers are encouraged to Pascal notation for methods and camel Case for variables, so I switched the static method name from 'search' to 'Search'.

Next, 'Directory' is a system class, so the reference to the NLucene directory needed to be fully qualified:

DotnetPark.NLucene.Store.Directory fsDir = FsDirectory.GetDirectory(indexDir, false);

rather than this:

Directory fsDir = FsDirectory.GetDirectory(indexDir, false);

Finally, the Hits class contains a couple differences. Java programmers use the length() method on a variety of classes, so it made sense for the Java version to use a length() method as well. C# introduced the idea of a property, which is nothing more than syntactic sweetness that allows the API developer to encapsulate the implementation of a variable, but allow access to it as if it were a public field. The end result is that instead of writing:

for (int i = 0; i
in Java, you'd use this in C#:

for (int i = 0; i
The authors of Lucene also decided to use the C# indexer functionality (which I wrote about a couple days ago) so that an instance of the Hits class can be accessed as if it were an array:

Document doc = hits[i].Document;

I put together a complete sample that you can download and compile yourself if you're interested in using NLucene. Download it here.

Lucene’s Query API

Erik Hatcher wrote an excellent article on the specifics of Lucene’s Query API, specifically on how the QueryParser class uses the Query subclasses including TermQuery, PhraseQuery, RangeQuery, WildcardQuery, PrefixQuery, FuzzyQuery and BooleanQuery. Very useful stuff.

Not unsurprisingly, he’s also writing a book on Lucene titled “Lucene in Action”, to be published by Manning.

2003 Lightweight Languages Workshop notes

Joe and I went to the Lightweight Languages Workshop at MIT this past Saturday. In short, a bunch of nerds got together for the entire day to talk about stuff like haskell, lisp, lua, scheme, and boundaries. If you’re at all interested, you can watch the webcasts (which are of surprisingly good quality) in Real Media or Windows Media here and Joe wrote up his notes already. I’m behind a bit, so here’s mine a couple days late.

The initial session “Toward a LL Testing Framework for Large Distributed Systems” was especially interesting to me for a couple reasons: a) it was based on technology deployed for DARPA called UltraLog, b) ultralog is an “… ultra-survivable multi-agent society” and c) it uses Jabber, Python and Ruby. Specifically, they used the above technologies to enable them to get an up close and personal look at how their entire system (in this case 1000’s of agents) was performing. Said another way, they created a way to quickly and unobtrusively gather information from a variety of datapoints while the program is running. If you have a single website on 1 server, this problem doesn’t matter to you much. But imagine a system of 5000 servers (which is something I was asked to imagine this past Monday, more on that at a later time). An application running on 5000 servers would generate an unuseable amount of information; simple logging statements won’t help you. I’m rambling though. The interesting takeaway from all this is the idea of creating instrumentation for your applications [google search for ‘code instrumentation’].

URLs harvested from the other sessions:

· Web Authoring System Haskell (WASH)

· XS: Lisp on Lego MindStorms

· the idea of continuations, where:

def foo(x):
  return x+1

becomes:

def foo(x,c):
  c(x+1)

· dynamic proxies [googled] [javaworld.com] [onjava.com]

· The Great Computer Language Shootout

· lua: embeddable in C/C++, Java, Fortran, Ruby, OPL, C#…. runs in Palm OS, Brew, Playstation II, XBox and Symbian.

· c minus minus

· scheme

CFX_Lucene updates

Couple people have written me in the last couple days with updates they’ve done to the Lucene and ColdFusion tags I wrote a couple months ago.

First, Nick Burch from Torchbox updated the CFX tag so that it “… behaves better under error conditions and … the command line debug now works.” I also read here that they (torchbox) are hoping to release an open source package written in Java “… to convert file
types to plain text and a CF custom tag to interface to Lucence which will
search them.

Today, Scott piped in with a nice addition that adds the score to the query returned to the calling tag:

// Define column indexes
String[] columns = { "URL", "TITLE", "SUMMARY", "SCORE" } ;

// loop over all the results, add each to the query
for (int i = 0; i
For those of you like Scott who want to index PDF and Office documents, I'd suggest you start taking a look at these JGURU FAQ's:

Java Guru: How can I index PDF documents?

Java Guru: How can I index Word documents?

Cheers!

C# Indexers

At Mindseye we’ve written a content management system, which is really just the net result of writing and improving upon a modular code base for a bunch of different websites in a variety of programming languages (ASP.NET, ASP, ColdFusion, and Java). In this content management system, which we’ve affectionately called ‘Element’, a website is distilled down in various ‘objects’; things like events, newsletters, products, etc. Long story short, each object (in whatever language) is usually represented as a specific type and is represented internally as an XML document. In the .NET/C# version that I’ve been working with lately, a newsletter would look vaguely like this:

public class Newsletter {
  private XmlDocument newsletter;
  public Newsletter(XmlDocument doc) {
   this.newsletter = doc;
  }
  // other methods left out
}

Putting aside your opinion on this being a good or bad design for a second, I’ve always struggled with how to best make the elements of the encapsulated xml document available to external classes. Right now I have a method:

public string GetProperty(string label) {
  // retrieve the appropriate element and return as string
  return theValue;
}

and this works pretty well, but it’s lengthy. Another way of doing it would be to make each element of the XmlDocument a public property, but this would require alot of typing and would require that you recompile the class everytime the data structure it represented changed. So tonight, during nerd time (ie: extended time by myself at Barnes and Noble) I read about C# indexers. You’ve probably used an indexer before; for instance the NameValueCollection class contains a public property called Item, which itself is a indexer for a specific entry in the NameValueCollection. Unbeknownst to me before tonight, you can create your own indexers, so instead of having to access an element of an object like this:

string newsletterLabel = newsletterInstance.GetProperty("label");

you could instead use this syntax:

string newsletterLabel = newsletterInstance["label"];

which just feels more natural to me. Implementing the indexer in your class is simple. Using the example ‘Newsletter’ class above:

public class Newsletter {
  private XmlDocument newsletter;
  public Newsletter(XmlDocument doc) {
   this.newsletter = doc;
  }
  // indexer
  public string this [string index] {
    get {
      // logic to retrieve appropriate element from xmldoc by name
      return theValue;
    }
  }
}

I’m guessing that generally the index will be an integer rather than the string that I have above, nothing much changes if the parameter is integer:

public string this [int index] {
  get {
    // logic to retrieve appropriate element from xmldoc by index
    return theValue;
  }
}

Extremely handy stuff to know! Couple more thoughts on the subject:

· Indexers
· Comparison Between Properties and Indexers
· Properties
· Developer.com: Using Indexers

C# static constructors, destructors

Spent some more time on the C# logging class I’ve been working on. Per Joe’s suggestions, I modified the StreamWriter so that it is now a class variable and as such, it need not be initialized every time the class is used. Instead, you can use a static constructor (the idea exists in both C# and Java, although they go by different names). In C#, you simply append the keyword ‘static’ to the class name:

public class Logger {
 static Logger() {
  // insert static resource initialization code here
 }
}

In Java it’s called a static initialization block (more here) and it looks like this:

public class Logger {
 static {
  // insert static resource initialization code here
 }
}

If you’d like a real life example of a Java static initialization block, check out the source for the LogManager class in the log4j package.

Anyway, now the Logger class declares a StreamWriter:

private static StreamWriter sw;

and then uses the static constructor to initialize it for writing to the text file:

static Logger() {
  // load the logging path
  // ...
  // if the file doesn't exist, create it
  // ...
  // open up the streamwriter for writing..
  sw = File.AppendText(logDirectory);
}

Then use the lock keyword when writing to the resource to make sure that multiple threads can access the resource:

 ...
 lock(sw) {
   sw.Write("\r\nLog Entry : ");
   ...
   sw.Flush();
 }

Now all objects that call the various static methods will be using the same private streamwriter. But you’re left with one more problem. The streamwriter is never explicitly closed. If the StreamWriter was an instance variable, then we could solve this by implementing a destructor. The destructor would take this form:

~ Logger() {
  try {
   sw.Close();
 } catch {
   // do nothing, exit..
  }
}

However, in this case the StreamWriter is a static/class variable, no ‘instance’ of Logger ever exists in the system, the ~Logger destructor will never get called. Instead, when the StreamWriter is eligible for destruction the garbage collector runs the StreamWriter’s Finalize method (which itself will then presumably call the Close() method of the StreamWriter instance), which will then automatically free up the resources used by the StreamWriter.

I updated the Logger class and it’s personal testing assistant TestLogger (which has also been updated to use 3 threads). You can download them here:

· Logger.cs
· TestLogger.cs