All posts by ajohnson

The intricacies of HTTP

I’ve been working on a small piece of C# software this week that posts data to an HTTP server (which handles credit card processing), parses the results and then returns the results to a C# client. Pretty easy to do, right? First you create a HttpWebRequest object:

String url = "http://server/path";
HttpWebRequest req = (HttpWebRequest)WebRequest.Create(url);

and then you post the data:

byte[] requestBytes = System.Text.Encoding.ASCII.GetBytes (some_data);
req.Method = "POST";
req.ContentType = "application/x-www-form-urlencoded";
req.ContentLength = requestBytes.Length;
Stream requestStream = req.GetRequestStream();
requestStream.Write(requestBytes,0,requestBytes.Length);
requestStream.Close();

Finally, you retrieve the HTML returned from the server:

// note: exception handling removed for easier reading
StreamReader sr = null;
HttpWebResponse res = (HttpWebResponse)req.GetResponse();
sr = new StreamReader(res.GetResponseStream(), System.Text.Encoding.ASCII);
String line = streamReader.ReadToEnd();
streamReader.Close();

The reason that I was working on it was that the application was returning random exceptions of the form:

Error reading response stream: System.Net.WebException: The underlying connection was closed: An unexpected error occurred on a receive.

Googling for this error message didn’t leave me with much. There were a smattering of posts on various web forums about the error, but not a whole lot of solutions. Long story short, I fired up TcpTrace and modified the KeepAlive property (setting it to false) of the HttpWebRequest object on a whim and voila! The application worked again. Best I can tell the HTTP server I’m working against doesn’t handle HTTP posts using Connection: Keep-Alive properly. For whatever reason it decides that the third request in a Keep-Alive connection should be closed.

Broadly, the reason I bring this up is because I think it’s important for all web developers to have an in-depth understanding of what’s going on under the hood of HTTP. Knowing the advantages and disadvantages of things like the HTTP Keep-Alive header becomes invaluable whenever you have to drop down to manually sending and receiving HTTP.

More pointedly, it was interesting to find out a couple tidbits about how .NET handles HTTP connections. First, by default .NET is configured (via machine.config) to use whatever proxy settings you have for Internet Explorer. You can turn this off by modifying the:

 configuration/system.net/defaultProxy/proxy

element. Second, also by default, machine.config only allows .NET applications to make 2 persistent connections to external resources. You can modify/view this as well:

 configuration/system.net/connectionManagement

Finally, the HttpWebRequest and it’s parent WebRequest again, by default, are set to use Keep-Alive connections.

Logging in C#: enumerations, thread-safe StreamWriter

Joe gave me some great feedback on the C# logging utility I wrote about a couple months ago. Per his suggestions, I modified it in the following ways:

1) Instead of using public static int variables as levels, I added an enumeration:

enum Priority : int {
  OFF = 200,
  DEBUG = 100,
  INFO = 75,
  WARN = 50,
  ERROR = 25,
  FATAL = 0
}

An enumeration is a value type (ie: the enumeration is not a full fledged object) and thus is allocated on the stack. I’m guessing that Joe suggested the use of an enumeration for 2 reasons. First, an enumeration groups the constants together… in some sense it encapsulates what was a group of unrelated static integers into a single type, in this case named ‘Priority’. Second, because enumerations are value types (and thus are allocated on the stack), they require less resources from both the processor and memory on which the application is running.

2) Joe mentioned “… you probably need to put a lock{} around the calls to it (StreamWriter) –it’s not guaranteed to be threadsafe.“. Turns out he’s right (not that it was a surprise). The StreamWriter documentation has this to say: “Any public static (Shared in Visual Basic) members of this type are thread safe. Any instance members are not guaranteed to be thread safe.” But the solution is easier than putting a lock{} on it. StreamWriter extends the TextWriter class, which itself has a static method for generating a thread safe wrapper. So where in the first version I had this:

StreamWriter sw = File.AppendText(filePath);

I now have this:

TextWriter tw = TextWriter.Synchronized(File.AppendText(filePath));

The File.AppendText() method returns a StreamWriter object, which the TextWriter.Synchronized() method wraps to create a thread-safe TextWriter which can be used just like a StreamWriter.

3) I noticed that the log4j implementation uses wrapper methods to make the argument list shorter. For instance, the Logger class has methods that look like this:

public void debug(Object message);
public void info(Object message);
public void warn(Object message);
public void error(Object message);
public void fatal(Object message);

I added the same idiom to my Logger class:

public static void Debug(String message) {
  Logger.Append(message, (int)Priority.DEBUG);
}

while still allowing for the more verbose:

public static void Append(String message, int level)

I uploaded the source and a test so you all can have a hack at it, if that kind of thing toots your horn:
· Logger.cs
· TestLogger.cs
I’m *always* open to comments and feedback. If you have even an inkling as to what I could do better with this code, *please* add your thoughts below.

Fail-Safe Amazon Image… using Java, C# & ColdFusion

Paul of onfocus.com fame (and the fabulous SnapGallery tool) wrote an article for the O’Reilly Network recently that (I think) was an excerpt of his recently released book “Amazon Hacks“. Anyway, he shows how you can check to see if an image exists on amazon.com using ASP, Perl, and PHP and I thought it would be fun to show how to do the same thing in Java, C# and ColdFusion. His examples were all functions of the form:

Function hasImage(imageUrl)

so I’m following that style. In Java you’d end up with something like this:

public static boolean hasImage(String url) {
boolean result = false;
  try {
    URL iurl = new URL(url);
    HttpURLConnection uc = (HttpURLConnection)iurl.openConnection();
    uc.connect();
    if (uc.getContentType().equalsIgnoreCase("image/jpeg")) {
      result = true;
    }
    uc.disconnect();
  } catch (Exception e) {
  }
  return result;
}

In C#, almost the exact same thing:

public static Boolean HasImage(String url) {
  Boolean result = false;
  try {
    HttpWebRequest webreq = (HttpWebRequest)WebRequest.Create(url);
    WebResponse res = webreq.GetResponse();
    if (res.ContentType == "image/jpeg") {
      result = true;
    }
    response.Close();
  } catch {
  }
  return result;
}

and then in ColdFusion:

<cffunction name="hasImage" returntype="boolean" output="no">
  <cfargument name="imageUrl" type="string" required="yes">
  <cfhttp url="#imageURL#" method="GET">
  <cfif cfhttp.responseHeader["Content-Type"] EQ "image/jpeg">
    <cfreturn true>
  <cfelse>
    <cfreturn false>
  </cfif>
</cffunction>

The full source for all these examples are available:

· Amazon.java
· Amazon.cs
· amazon.cfm

Enjoy!

Custom string formatting in C#

Formatting strings for output into various mediums is always a fun… err.. required task. Every language does it differently. C# overloads the ToString() method to format a string using this syntax:

Console.WriteLine(MyDouble.ToString("C"));

where “C” is a format specifier specifically for locale specific currency formatting. If the variable ‘MyDouble’ was 3456 in the example above, you’d see:

$3546.00

printed out. Of course, the fun doesn’t end there. There are a whole boatload of standard numeric formatting specifiers you can use including decimal, number, percent and hexadecimal. But truly the most fun are the custom numeric format strings. Example: Let’s say that your boss wants you to format all product pricing rounded to the nearest dollar without using a commas (ie: $1224 instead of $1,224.00). Normally, you’d write:

Price: <%= Product.Price.ToString("C") %>

but since you don’t want to have commas, you can use a custom format string:

Price: <%= Product.Price.ToString("$#####") %>

which will produce this:

Price: $1224

How about phone numbers? Don’t they just suck to format? In ColdFusion, you’d have something like this:

(#left(str, 3)#) #mid(str, 4, 3)# - #right(str, 4)#

where ‘str’ is a string containing the 10 digit phone number. In C#, you can write this:

phone.ToString("(###) ### - ####");

Pretty concise isn’t it?

Spidering Hacks

I fielded a couple questions this week about search engine safe URL’s both of them along of the lines of a) how do you create them? and b) are they even worth it? I’m written about how you can create them using Apache before, but one of the things I didn’t mention was that I think writing your own spider.. or at least attempting to, is a great first step to understanding why search engine safe URL’s are important. To that end, I’d suggest the “Spidering Hacks” book that Oreilly just released as a great starting point. The book uses Perl quite extensively, but it’s the process that matters. I’ve picked up “Programming Spiders, Bots, and Aggregators in Java” at Barnes and Noble quite a few times as well, but have never pulled the trigger.

If you’d rather read code, you can download the spider/indexing engine I’ve been working on (was working on!) to get some kind of idea of what goes into a spider.

Martin Fowler @ NEJUG: Software Design in the Twenty-first Century

I attended the NEJUG meeting in Lowell last week that Martin Fowler spoke at. I was the guy in the back furiously typing notes, which I’m presenting for your pleasure here, revised and polished.

Martin started out by saying that he didn’t know exactly what to talk about and then he launched into a discussion abou the completely new version of UML very near to completion, UML 2.0.

· newest version of uml has alot of revisions to the metamodel,
— in lieu of people thinking that uml is all about diagrams and people not really caring all that much about diagrams
— 3 ways in which uml is commonly used: sketches, blueprints and langauges

· sketches
— draw diagrams to communicate ideas about software
— not precise, just an overall picture
— main idea is to convey ‘ideas’
— most books that use UML are completely wrong in their use of uml (and almost all are sketches)
— martin’s favorite uml drawing tool? a printing whiteboard
— no whiteboard? then use visio (templates are available on martin’s webpage of links)
togetherJ (an ide?) has built in UML support for sketching

· blueprints
— software development as an engineering process
— martin doesn’t agree w/ this metaphor
— one body of people who “design” and make all the important design decisions, hand off to another group for implementation
— reality is that no one cares if the software matches the diagram
— in real civil engineering the designers check to make sure that the end result matches the original design
— both parties need to be intimately familiar with the intracacies of the UML specification for blueprints to work

· uml as a programming langauage
— “excecutable UML”
— “model driven architecture”
— graphics matter less, the meta model takes precedence in this way of thinking

· the people that are driving the standard are both heavily involved with the blueprints and Uml as a programming language direction…, which means that not many people are thinking about people that use uml as a sketching tool
— thus, almost all the changes in uml2 are for uml as a programming language
— these people think that in 10/20 years no one will be programming in java/c#, but rather in UML
— CASE people said the same thing in the 80’s

couple arguments that might make this possible
a) UML is a standard where case is a company (examples: SQL, Java..)
   1) however, it’s a pretty complex standard and not everything is agree upon
   2) alot of things are open to interpretation
   3) subsets of UML aren’t made all for execeutable UML
   4) digrams can’t be transferred from one tool to another without a loss of data

b) malapropism for platform indepedence
— build a platform independent model (without regard to programming language)
— then do a build to a platform specific model that would then build the source code
— uml people think of platforms as “java” or “.net”

c) uml hasn’t yet approached the discussion of the libraries
— ie: it can’t write “hello world” yet

sequence diagrams were so easy that anyone could understand just by looking

interaction diagrams, which are new to 2.0, are much more complicated

mf doesn’t think that code generation will *really* be all that much more productive than regular programming in Java or C#?

mf quotes an engineer who said: “engineering is different from programming in that engineering has tolerance while programming is absolute.”

uml’s success as a programming language hinges on it’s ability to make people more productive

mf thinks that there will be a growing divergence between the sketchers and the blueprinters/executeable people

prescriptive vs. descriptive
— uml is increasing becoming descriptive, not prescriptive

structural engineers use no “standard” drawing diagrams, but simply follow “accepted” rules… a trend that MF thinks that UML will probably folllow, we’ll all use UML, but not necessarily according to the specification

questions that came from the audience

Notes on Peer-To-Peer: Harnessing the Power of Disruptive Technologies

Peer-To-Peer (amazon, oreilly) is an old book by internet standards (published in March of 2001), but chock full of interesting thoughts and perspectives.

· On gnutella, did you know that you can watch what other people are searching for? The book has a screenshot of gnutella client v0.56, I have Gnucleus, but you can do it in Gnucleus by clicking Tools –> Statistics –> Log. Betcha you didn’t know that an entire company was founded based on that idea did you?

· footnote: Chaffing and Winnowing: Confidentiality without Encryption by Ronald L. Rivest

· On the ‘small-world’ model and the importance of bridges: “.. The key to understanding the result lies in the distribution of links within social networks. In any social grouping, some acquaintances will be relatively isolated and contribute few new contacts, whereas otherws will have more wide-ranging connections and be able to serve as bridges between far-flung social clusters. These bridging vertices play a critical role in bringing the network closer together… It turns out that the presence of even a small number of bridges can dramatically reduce the lengths of paths in a graph, …” — Reference “Collective dynamics of ‘small-world’ networks” published in Nature, download the PDF.

· Publius looks like some cool software

· Crowds: “Crowds is a system whose goals are similar to that of mix networks but whose implementation is quite different. Crowds is based on that idea that people can be anonymous when they blend into a crowd. As with mix networks, Crowds users need not trust a single third party in order to maintain their anonymity. A crowd consists of a group of web surfers all running the Crowds software. When one crowd member makes a URL request, the Crowds software on the corresponding computer randomly chooses between retrieving the requested document or forwarding the request to a randomly selected member of the crowd…. ” — Read more about Crowds at the official site.

· Tragedy of the Commons. This idea was mentioned in chapter 16 on Accountability and is talked about in various other books I’ve read, but I’m not sure that I ever recorded the source. The idea came from Garrett Hardin in a paper written in 1968 called “The Tragedy of the Commons”, which you can read on his site.

· On accountability and how we really *are* all just six degrees apart. Read the PGP Web of Trust statistics if you don’t believe it.

· On reputation systems, specifically Advogato’s Trust Metric

· On reputation scoring systems, a good system “… will possess many of the following qualities:

  • Accurate for long-term performance: The system reflects the confidence (the likelihood of accuracy) of a given score. It can also distinguish between a new entity of unknown quality and an entity with bad long-term performance.
  • Weighted toward current behavior: The system recognizes and reflects recent trends in entity performance. For instance, an entity that has behaved well for a long time but suddenly goes downhill is quickly recognized and no longer trusted.
  • Efficient: It is convenient if the system can recalculate a score quickly. Calculations that can be performed incrementally are important.
  • Robust against attacks: The system should resist attempts of any entity or entities to influence scores other than by being more honest or having higher quality.
  • Amenable to statistical evaluation: It should be easy to find outliers and other factors that can make the system rate scores differently.
  • Private: No one should be able to learn how a given rater rated an entity except the rater himself.
  • Smooth: Adding any single rating or small number of ratings doesn’t jar the score much.
  • Understandable: It should be easy to explanin to people who use these scores what they mean — not only so they know how the system works, but so they can evaluate for themselves what the score implies.
  • Verifiable: A score under dispute can be supported with data.”

· On reputation scoring systems: “Vulnerabilities from overly simple scoring systems are not limited to “toy” systems like Instant Messenger. Indeed, eBay suffers from a similar problem. In eBay, the reputation score for an individual is a linear combination of good and bad ratings, one for each transaction. Thus, a vendor who has performed dozens of transactions and cheats on only 1 out of every 4 customers will have a steadily rising reputation, whereas a vendor who is completely honest but has done only 10 transactions will be displayed as less reputable. As we have seen, a vendor could make a good profit (and build a strong reputation!) by being honest for several small transactions and then being dishonest for a single large transaction.

· The book was written when Reputation Technologies was still a distinct company, but I thought this list of Reputation and Asset Management vendors was interesting in that reputation is something that is becoming more and more important.. for instance, when was the last time you purchased something from eBay where the vendor had a bad rating? Never right? Did you ever stop to think about how the vendor in question got a bad rating? Since when is eBay a good judge of someone’s character? Why do we trust eBay’s reputation algorigthms?

· On the optimal size of an organization: “Business theorists have observed that the ability to communicate broadly and deeply through the Internet as low cost is driving a process whereby large businesses break up into a more competitive system of smaller component companies. They call this process ‘deconstruction.’ This process is an example of Coase’s Law, which states that other things being equal, the cost of a transaction — negotiating, paying, dealing with errors or fraud — between firms determines the optimal size of the firm. When business transactions between firms are expensive, it’s more economical to have larger firms, even though larger firms are considered less efficient because they are slower to make decisions. When transactions are cheaper, small firms can replace the larger integrated entity.

· “Why Johnny Can’t Encrypt: A Usability Evaluation of PGP 5.0 (pdf)” from Alma Whitten

Notes on “Things A Computer Scientist Rarely Talks About”

I picked up “Things A Computer Scientist Rarely Talks About” by Donald Knuth at Barnes & Noble a couple weeks back on a whim after spending 45 minutes looking through the fascinating science/technology section at the back of the Natick store. (sidenote: some Barnes and Nobles have fabulous science/technology/computer science/engineering sections with rows and rows of books… and some have “JavaScript for Dummies”. Why is that?)

It’s not a book about computer science but is rather the transcribed text of his series of public lectures about interactions between faith and computer science (which you can view online). Couple quotes I deemed noteworthy for one reason or another:

· On page 28 he talks about he how he used randomization when grading papers while teaching at Stanford. Reminder to read up on “zero knowledge proofs” sometime.

· The basis of his lectures was a book he wrote called “3:16 Bible Texts Illuminated” which aimed to gain an understanding into the Bible by taking 59 random snapshots (verses) and studying them in detail. His son was inspired indirectly by this book: “… to start up the H-20 project, which is designed to answer the question ‘What is Massachusetts?’ … He and my daughter have a book of maps of Massachusetts at a large scale; they live fairly near campus, at coordinates H-20 in the relevant map of Cambridge. So they’re going to try and visit H-20 on all the other pages of their book. That should give terrific insights into the real nature of Massachusetts.

· on learning: “… I learned that the absolute best way to find out what you don’t understand is to try to express something in your own words. If I had been operating only in input mode, looking at other translations but not actually trying to output the thoughts they expressed, I would never have come to grips with the many shades of meaning that lurk just below the surface. In fact, I would never have realized that such shades of meaning even exist, if I had just been inputting. The exercise of producing output, trying to make a good translation by yourself, is a tremendous help to your education.

· A quote from Peter Gomes at the beginning of his book called “The Good Book“: “… The notion that [the texts of the Bible] have meaning and integrity, intention, contexts and subtexts, and that they are part of an enormous history of interpretation that has long involved some of the greatest thinkers in the history of the world, is a notion often lost on those for whom the text is just one more of the many means the church provides to massage the egos of its members.

· One of the questions asked about Douglas Hofstadter’s book “Le Ton Beau de Marot: In Praise of the Music of Language“.

· “My experience suggests that the optimum way to run a research think tank would be to take people’s nice offices away from them and to make them live in garrets, and even to insist that they do non-researchy things. That’s a strange way to run a research center, but it might well be true that the imposition of such constraints would bring out maximum creativity.” — after mentioning that he was able to come up with several relatively important ideas (attribute grammars, Knuth-Bendix completion, LL(k) parsing) during the “most hectic year of his life”.

· On aesthetics according to C. S. Peirce: “Aesthetics deals with things that are admirable; ethics deals with things that are right or wrong; logic deals with things that are true or false.

· “Somehow the whole idea of art and aesthetics and beauty underlies all the scientific work I do. Whatever I do, I try to do it in a way that has some elegance; I try to create something that I think is beautiful. Instead of just getting a job done, I prefer to do my work in a way that pleases me in as many senses as possible…. I like especially to be associated with art, in the sense of making things of beauty.

· Planet Without Laughter: “.. It’s a marvelous parable on many levels, about the limits of rationality. You can read it to get insight about all religions, and about the question of form over substance in religion.

· Eugene Wigner, a Princeton physicist: “It is good that the completion of our scientific work is an unattainable ideal. Striving toward it is attracting many of us, and gives much pleasure and satisfaction… If science were completed, the satisfaction which research, the furthering of human knowledge, had provided, would disappear. Also, even more men would strive for power and domination…. We know that there are facts and insights which we cannot communicate to animals — no animal is familiar, for instance, with the associative law of multiplication… Is it not possible that our understanding of nature also has limitations?… I hope that, even if this should be true, we will be able to continue the extension of our knowledge indefinitely, … even if the limit thereof will always remain widely separated from the complete knowledge and understanding of nature.

· On artificial life: “… the Game of Life illustrates the power of evolutionary mechanisms. Stable configurations arise out of random soup, usually very quickly; and many of those configurations have properties analogous to biological organisms.

· Stuart Sutherland, in the 1996 edition of the International Dictionary of Psychology: “Consciousness: The having of perceptions, thoughts and feelings; awareness. The term is impossible to define except in terms that are unintelligible without a grasp of what consciousness means. Consciousness is a fascinating but elusive phenomenom: it is impossible to specify what it is, what it does, or why it evolved. Nothing worth reading has ever been written on it.