Category Archives: Software Development

The intricacies of HTTP

I’ve been working on a small piece of C# software this week that posts data to an HTTP server (which handles credit card processing), parses the results and then returns the results to a C# client. Pretty easy to do, right? First you create a HttpWebRequest object:

String url = "http://server/path";
HttpWebRequest req = (HttpWebRequest)WebRequest.Create(url);

and then you post the data:

byte[] requestBytes = System.Text.Encoding.ASCII.GetBytes (some_data);
req.Method = "POST";
req.ContentType = "application/x-www-form-urlencoded";
req.ContentLength = requestBytes.Length;
Stream requestStream = req.GetRequestStream();
requestStream.Write(requestBytes,0,requestBytes.Length);
requestStream.Close();

Finally, you retrieve the HTML returned from the server:

// note: exception handling removed for easier reading
StreamReader sr = null;
HttpWebResponse res = (HttpWebResponse)req.GetResponse();
sr = new StreamReader(res.GetResponseStream(), System.Text.Encoding.ASCII);
String line = streamReader.ReadToEnd();
streamReader.Close();

The reason that I was working on it was that the application was returning random exceptions of the form:

Error reading response stream: System.Net.WebException: The underlying connection was closed: An unexpected error occurred on a receive.

Googling for this error message didn’t leave me with much. There were a smattering of posts on various web forums about the error, but not a whole lot of solutions. Long story short, I fired up TcpTrace and modified the KeepAlive property (setting it to false) of the HttpWebRequest object on a whim and voila! The application worked again. Best I can tell the HTTP server I’m working against doesn’t handle HTTP posts using Connection: Keep-Alive properly. For whatever reason it decides that the third request in a Keep-Alive connection should be closed.

Broadly, the reason I bring this up is because I think it’s important for all web developers to have an in-depth understanding of what’s going on under the hood of HTTP. Knowing the advantages and disadvantages of things like the HTTP Keep-Alive header becomes invaluable whenever you have to drop down to manually sending and receiving HTTP.

More pointedly, it was interesting to find out a couple tidbits about how .NET handles HTTP connections. First, by default .NET is configured (via machine.config) to use whatever proxy settings you have for Internet Explorer. You can turn this off by modifying the:

 configuration/system.net/defaultProxy/proxy

element. Second, also by default, machine.config only allows .NET applications to make 2 persistent connections to external resources. You can modify/view this as well:

 configuration/system.net/connectionManagement

Finally, the HttpWebRequest and it’s parent WebRequest again, by default, are set to use Keep-Alive connections.

Logging in C#: enumerations, thread-safe StreamWriter

Joe gave me some great feedback on the C# logging utility I wrote about a couple months ago. Per his suggestions, I modified it in the following ways:

1) Instead of using public static int variables as levels, I added an enumeration:

enum Priority : int {
  OFF = 200,
  DEBUG = 100,
  INFO = 75,
  WARN = 50,
  ERROR = 25,
  FATAL = 0
}

An enumeration is a value type (ie: the enumeration is not a full fledged object) and thus is allocated on the stack. I’m guessing that Joe suggested the use of an enumeration for 2 reasons. First, an enumeration groups the constants together… in some sense it encapsulates what was a group of unrelated static integers into a single type, in this case named ‘Priority’. Second, because enumerations are value types (and thus are allocated on the stack), they require less resources from both the processor and memory on which the application is running.

2) Joe mentioned “… you probably need to put a lock{} around the calls to it (StreamWriter) –it’s not guaranteed to be threadsafe.“. Turns out he’s right (not that it was a surprise). The StreamWriter documentation has this to say: “Any public static (Shared in Visual Basic) members of this type are thread safe. Any instance members are not guaranteed to be thread safe.” But the solution is easier than putting a lock{} on it. StreamWriter extends the TextWriter class, which itself has a static method for generating a thread safe wrapper. So where in the first version I had this:

StreamWriter sw = File.AppendText(filePath);

I now have this:

TextWriter tw = TextWriter.Synchronized(File.AppendText(filePath));

The File.AppendText() method returns a StreamWriter object, which the TextWriter.Synchronized() method wraps to create a thread-safe TextWriter which can be used just like a StreamWriter.

3) I noticed that the log4j implementation uses wrapper methods to make the argument list shorter. For instance, the Logger class has methods that look like this:

public void debug(Object message);
public void info(Object message);
public void warn(Object message);
public void error(Object message);
public void fatal(Object message);

I added the same idiom to my Logger class:

public static void Debug(String message) {
  Logger.Append(message, (int)Priority.DEBUG);
}

while still allowing for the more verbose:

public static void Append(String message, int level)

I uploaded the source and a test so you all can have a hack at it, if that kind of thing toots your horn:
· Logger.cs
· TestLogger.cs
I’m *always* open to comments and feedback. If you have even an inkling as to what I could do better with this code, *please* add your thoughts below.

Fail-Safe Amazon Image… using Java, C# & ColdFusion

Paul of onfocus.com fame (and the fabulous SnapGallery tool) wrote an article for the O’Reilly Network recently that (I think) was an excerpt of his recently released book “Amazon Hacks“. Anyway, he shows how you can check to see if an image exists on amazon.com using ASP, Perl, and PHP and I thought it would be fun to show how to do the same thing in Java, C# and ColdFusion. His examples were all functions of the form:

Function hasImage(imageUrl)

so I’m following that style. In Java you’d end up with something like this:

public static boolean hasImage(String url) {
boolean result = false;
  try {
    URL iurl = new URL(url);
    HttpURLConnection uc = (HttpURLConnection)iurl.openConnection();
    uc.connect();
    if (uc.getContentType().equalsIgnoreCase("image/jpeg")) {
      result = true;
    }
    uc.disconnect();
  } catch (Exception e) {
  }
  return result;
}

In C#, almost the exact same thing:

public static Boolean HasImage(String url) {
  Boolean result = false;
  try {
    HttpWebRequest webreq = (HttpWebRequest)WebRequest.Create(url);
    WebResponse res = webreq.GetResponse();
    if (res.ContentType == "image/jpeg") {
      result = true;
    }
    response.Close();
  } catch {
  }
  return result;
}

and then in ColdFusion:

<cffunction name="hasImage" returntype="boolean" output="no">
  <cfargument name="imageUrl" type="string" required="yes">
  <cfhttp url="#imageURL#" method="GET">
  <cfif cfhttp.responseHeader["Content-Type"] EQ "image/jpeg">
    <cfreturn true>
  <cfelse>
    <cfreturn false>
  </cfif>
</cffunction>

The full source for all these examples are available:

· Amazon.java
· Amazon.cs
· amazon.cfm

Enjoy!

Spidering Hacks

I fielded a couple questions this week about search engine safe URL’s both of them along of the lines of a) how do you create them? and b) are they even worth it? I’m written about how you can create them using Apache before, but one of the things I didn’t mention was that I think writing your own spider.. or at least attempting to, is a great first step to understanding why search engine safe URL’s are important. To that end, I’d suggest the “Spidering Hacks” book that Oreilly just released as a great starting point. The book uses Perl quite extensively, but it’s the process that matters. I’ve picked up “Programming Spiders, Bots, and Aggregators in Java” at Barnes and Noble quite a few times as well, but have never pulled the trigger.

If you’d rather read code, you can download the spider/indexing engine I’ve been working on (was working on!) to get some kind of idea of what goes into a spider.

Martin Fowler @ NEJUG: Software Design in the Twenty-first Century

I attended the NEJUG meeting in Lowell last week that Martin Fowler spoke at. I was the guy in the back furiously typing notes, which I’m presenting for your pleasure here, revised and polished.

Martin started out by saying that he didn’t know exactly what to talk about and then he launched into a discussion abou the completely new version of UML very near to completion, UML 2.0.

· newest version of uml has alot of revisions to the metamodel,
— in lieu of people thinking that uml is all about diagrams and people not really caring all that much about diagrams
— 3 ways in which uml is commonly used: sketches, blueprints and langauges

· sketches
— draw diagrams to communicate ideas about software
— not precise, just an overall picture
— main idea is to convey ‘ideas’
— most books that use UML are completely wrong in their use of uml (and almost all are sketches)
— martin’s favorite uml drawing tool? a printing whiteboard
— no whiteboard? then use visio (templates are available on martin’s webpage of links)
togetherJ (an ide?) has built in UML support for sketching

· blueprints
— software development as an engineering process
— martin doesn’t agree w/ this metaphor
— one body of people who “design” and make all the important design decisions, hand off to another group for implementation
— reality is that no one cares if the software matches the diagram
— in real civil engineering the designers check to make sure that the end result matches the original design
— both parties need to be intimately familiar with the intracacies of the UML specification for blueprints to work

· uml as a programming langauage
— “excecutable UML”
— “model driven architecture”
— graphics matter less, the meta model takes precedence in this way of thinking

· the people that are driving the standard are both heavily involved with the blueprints and Uml as a programming language direction…, which means that not many people are thinking about people that use uml as a sketching tool
— thus, almost all the changes in uml2 are for uml as a programming language
— these people think that in 10/20 years no one will be programming in java/c#, but rather in UML
— CASE people said the same thing in the 80’s

couple arguments that might make this possible
a) UML is a standard where case is a company (examples: SQL, Java..)
   1) however, it’s a pretty complex standard and not everything is agree upon
   2) alot of things are open to interpretation
   3) subsets of UML aren’t made all for execeutable UML
   4) digrams can’t be transferred from one tool to another without a loss of data

b) malapropism for platform indepedence
— build a platform independent model (without regard to programming language)
— then do a build to a platform specific model that would then build the source code
— uml people think of platforms as “java” or “.net”

c) uml hasn’t yet approached the discussion of the libraries
— ie: it can’t write “hello world” yet

sequence diagrams were so easy that anyone could understand just by looking

interaction diagrams, which are new to 2.0, are much more complicated

mf doesn’t think that code generation will *really* be all that much more productive than regular programming in Java or C#?

mf quotes an engineer who said: “engineering is different from programming in that engineering has tolerance while programming is absolute.”

uml’s success as a programming language hinges on it’s ability to make people more productive

mf thinks that there will be a growing divergence between the sketchers and the blueprinters/executeable people

prescriptive vs. descriptive
— uml is increasing becoming descriptive, not prescriptive

structural engineers use no “standard” drawing diagrams, but simply follow “accepted” rules… a trend that MF thinks that UML will probably folllow, we’ll all use UML, but not necessarily according to the specification

questions that came from the audience

Notes on Peer-To-Peer: Harnessing the Power of Disruptive Technologies

Peer-To-Peer (amazon, oreilly) is an old book by internet standards (published in March of 2001), but chock full of interesting thoughts and perspectives.

· On gnutella, did you know that you can watch what other people are searching for? The book has a screenshot of gnutella client v0.56, I have Gnucleus, but you can do it in Gnucleus by clicking Tools –> Statistics –> Log. Betcha you didn’t know that an entire company was founded based on that idea did you?

· footnote: Chaffing and Winnowing: Confidentiality without Encryption by Ronald L. Rivest

· On the ‘small-world’ model and the importance of bridges: “.. The key to understanding the result lies in the distribution of links within social networks. In any social grouping, some acquaintances will be relatively isolated and contribute few new contacts, whereas otherws will have more wide-ranging connections and be able to serve as bridges between far-flung social clusters. These bridging vertices play a critical role in bringing the network closer together… It turns out that the presence of even a small number of bridges can dramatically reduce the lengths of paths in a graph, …” — Reference “Collective dynamics of ‘small-world’ networks” published in Nature, download the PDF.

· Publius looks like some cool software

· Crowds: “Crowds is a system whose goals are similar to that of mix networks but whose implementation is quite different. Crowds is based on that idea that people can be anonymous when they blend into a crowd. As with mix networks, Crowds users need not trust a single third party in order to maintain their anonymity. A crowd consists of a group of web surfers all running the Crowds software. When one crowd member makes a URL request, the Crowds software on the corresponding computer randomly chooses between retrieving the requested document or forwarding the request to a randomly selected member of the crowd…. ” — Read more about Crowds at the official site.

· Tragedy of the Commons. This idea was mentioned in chapter 16 on Accountability and is talked about in various other books I’ve read, but I’m not sure that I ever recorded the source. The idea came from Garrett Hardin in a paper written in 1968 called “The Tragedy of the Commons”, which you can read on his site.

· On accountability and how we really *are* all just six degrees apart. Read the PGP Web of Trust statistics if you don’t believe it.

· On reputation systems, specifically Advogato’s Trust Metric

· On reputation scoring systems, a good system “… will possess many of the following qualities:

  • Accurate for long-term performance: The system reflects the confidence (the likelihood of accuracy) of a given score. It can also distinguish between a new entity of unknown quality and an entity with bad long-term performance.
  • Weighted toward current behavior: The system recognizes and reflects recent trends in entity performance. For instance, an entity that has behaved well for a long time but suddenly goes downhill is quickly recognized and no longer trusted.
  • Efficient: It is convenient if the system can recalculate a score quickly. Calculations that can be performed incrementally are important.
  • Robust against attacks: The system should resist attempts of any entity or entities to influence scores other than by being more honest or having higher quality.
  • Amenable to statistical evaluation: It should be easy to find outliers and other factors that can make the system rate scores differently.
  • Private: No one should be able to learn how a given rater rated an entity except the rater himself.
  • Smooth: Adding any single rating or small number of ratings doesn’t jar the score much.
  • Understandable: It should be easy to explanin to people who use these scores what they mean — not only so they know how the system works, but so they can evaluate for themselves what the score implies.
  • Verifiable: A score under dispute can be supported with data.”

· On reputation scoring systems: “Vulnerabilities from overly simple scoring systems are not limited to “toy” systems like Instant Messenger. Indeed, eBay suffers from a similar problem. In eBay, the reputation score for an individual is a linear combination of good and bad ratings, one for each transaction. Thus, a vendor who has performed dozens of transactions and cheats on only 1 out of every 4 customers will have a steadily rising reputation, whereas a vendor who is completely honest but has done only 10 transactions will be displayed as less reputable. As we have seen, a vendor could make a good profit (and build a strong reputation!) by being honest for several small transactions and then being dishonest for a single large transaction.

· The book was written when Reputation Technologies was still a distinct company, but I thought this list of Reputation and Asset Management vendors was interesting in that reputation is something that is becoming more and more important.. for instance, when was the last time you purchased something from eBay where the vendor had a bad rating? Never right? Did you ever stop to think about how the vendor in question got a bad rating? Since when is eBay a good judge of someone’s character? Why do we trust eBay’s reputation algorigthms?

· On the optimal size of an organization: “Business theorists have observed that the ability to communicate broadly and deeply through the Internet as low cost is driving a process whereby large businesses break up into a more competitive system of smaller component companies. They call this process ‘deconstruction.’ This process is an example of Coase’s Law, which states that other things being equal, the cost of a transaction — negotiating, paying, dealing with errors or fraud — between firms determines the optimal size of the firm. When business transactions between firms are expensive, it’s more economical to have larger firms, even though larger firms are considered less efficient because they are slower to make decisions. When transactions are cheaper, small firms can replace the larger integrated entity.

· “Why Johnny Can’t Encrypt: A Usability Evaluation of PGP 5.0 (pdf)” from Alma Whitten

The Philosophy of Ruby: An interview with Yukihiro Matsumoto

Bill Venners just posted the first of an installment of articles with Yukihiro Matsumoto, the creator of the programming language Ruby. Specifically, they talk about the how Ruby wasn’t designed to the the ‘perfect’ language (but rather a language that feels good when used), and “… the danger of orthogonality, granting freedom with guidance, the principle of least surprise and the importance of the human in computer endeavors.

I thought the quote “Language designers want to design the perfect language.” could also be re-phrased as “Programmers want to feel like their language is the perfect lanaguage.” I know this blog is being syndicated through fullasagoog.com (as a ColdFusion blog) and also through markme.com (as a Java blog) and I read alot of the blogs on both sites, as well as some of the blogs on weblogs.asp.net and javablogs.com. It’s interesting that all of the above mentioned sites (not to mention slashdot) are generally short sighted when it comes to the subject of what language is better (reference discussions re: Java as the SUV of programming languages, PHP vs. ASP.NET, MX vs. .NET) and hammer away at how x is better than y. I think Yukihiro is right, there isn’t a ‘perfect programming’ language and there never will be. Macromedia employees probably aren’t encouraged to say this, but I’d encourage anyone writing a ColdFusion application to try and write a similar application in ASP.NET or in Java using Struts or in ASP.. or even Ruby. You’ll be amazed at how things you’ll learn.

re: Features Talk, but Behaviors Close

From the cooper.com newsletter, an article that was able to clearly and poignantly articulate a discussion I’ve struggled with many times at work. As part of our proposal process we’re constantly putting together “feature lists” (which admittedly we do combine with more in-depth lists and descriptions) that describes each and every entity included in the project.

Discussions of a software product in terms of its features were intended to serve as a bridge between constituents who otherwise had few terms in common: users and software developers. Users want a product to satisfy their goals (why else use a productivity application?), while software developers need to know what to build (otherwise they will just make it up themselves). Meanwhile, marketers and sales folks want to discuss the characteristics of a forthcoming product. So everybody has been instructed to talk in terms of “features.” Customer needs are translated into a marketing requirements document, which serves as a vague template for software construction. But what started out as a bridge—features—has broken apart. Users stand at one anchorage and product developers stand at the other, both scratching their heads at the expanse of shark-infested waters still separating them.

In short, the solution:

While part of the discussion can take place using the language of features (for instance, the IT guy is going to want to know whether the product has “128-bit encryption”), the best opportunities and longest-lasting relationships are going to come when the language of goals and behaviors is introduced, because then you’re in the business of solving personal goals and organizational objectives, rather than feature checklists.

Please send this to every software sales, marketing and project manager that you know. No more “features”!