Public Health and Information Technology

Jon Udell posted the transcript of his conversation with Dr. Joel Selanikio (who, as the co-founder of DataDyne, is in the business of collecting public health data in developing countries) last Thursday, which I believe was right about the same time I finished a book I picked up at my parents house while on vacation. My dad is the public health officer for Mono County in California and can’t leave a bookstore without buying a couple books. He just so happened to have a book written by Tracy Kidder (author of The Soul Of A New Machine, among other books) titled ‘Mountains Beyond Mountains: Healing the World: The Quest of Dr. Paul Farmer, which is the story of one Dr. Paul Farmer and his quest to

“… make a difference in solving global health problems through a clear-eyed understanding of the interaction of politics, wealth, social systems and disease.”

It’s a tremendous book and Jon’s blog post reminded me that I dog-eared a couple pages from the book that I wanted to note here for posterity and also that I was bothered while reading the book that information technology seemed not to have a role in Dr. Farmer’s plans for improving international public health. It’s fitting that one of the sayings that Dr. Farmer is fond of using (from page 177) is:

On what data exactly do you base that statement?

because Dr. Selanikio seems to have the same opinion:

Well, in clinical medicine, the way that we understand things is — if it’s a rash, I look at the rash, I think about it, I look stuff up, but I don’t systematically create a database. For one patient you can juggle the variables in your head. But when you have a population of affected people, you need to collect data and analyze it. That’s the basis of epidemiology.

So there ya go… information technology does have a place.

If you’re interested in the data collection that DataDyne does, check out these YouTube video interviews that Dr. Joel Selanikio did this last May:

instantFeeds 1.0.4

I rolled out a new version of instantFeeds tonight, you can read all about the new features / bug fixes here. The big feature is that now all notifications will include the title, link and summary of every item in the feed whose publication date is later than the date of the last notification the system sent. I had a number of people write in to tell me they were using it and that they could use that exact feature.

Thanks to jas osborne for the patch that got me started again!

Debugging SOAP / XFire with ethereal

I’ve spent way more time than I should have the last couple weeks working to help migrate a website built against Jive Forums to run against a Clearspace X instance. As part of the migration, one of the things I did was to move all the data syndication that had been done with RSS and custom namespaces to use the Clearspace SOAP API, which is built on a technology called XFire. The first problem I ran into was that production website was configured so that requests to http://example.com were redirected to http://www.example.com/, which resulted in errors like this in the logs:

Jul 5, 2007 11:30:11 PM org.apache.commons.httpclient.HttpMethodDirector isRedirectNeeded
INFO: Redirect requested but followRedirects is disabled

That error was pretty easy to fix (swap in http://www.example.com in place of http://example.com), but the next thing I ran into was way less intuitive. When I invoked a certain service, I’d get a stack trace that looked like this:

Exception in thread "main" org.codehaus.xfire.XFireRuntimeException: Could not invoke service.. 
Nested exception is org.codehaus.xfire.fault.XFireFault: Unexpected character '-' (code 45) in prolog; expected '<'
 at [row,col {unknown-source}]: [2,1]
org.codehaus.xfire.fault.XFireFault: Unexpected character '-' (code 45) in prolog; expected '<'
 at [row,col {unknown-source}]: [2,1]
	at org.codehaus.xfire.fault.XFireFault.createFault(XFireFault.java:89)
	at org.codehaus.xfire.client.Client.onReceive(Client.java:386)

which was troubling because the exact same SOAP method invocation worked fine on both my local machine and in the test environment. What was different? Two things: the production system was running on Java 6 and the production system was configured to run behind an Apache HTTP server proxied by mod_caucho versus no Apache HTTP server / proxy in development or on my machine. I needed to see what was going on between the server and the client (one of the things that makes SOAP so hard is that you can't just GET a URL to see what's being returned) so I fired up ethereal at the behest of one of my coworkers. I kicked off a couple of SOAP requests with ethereal running, recorded the packets and then analyzed the capture. Said coworker then pointed out the key to debugging HTTP requests with ethereal: right click on the TCP packet you're interested in and then click 'Follow TCP Stream'. The invocation response looked like this when run against the development environment:

HTTP/1.1 200 OK
Date: Mon, 02 Jul 2007 21:59:30 GMT
Server: Resin/3.0.14
Content-Type: multipart/related; type="application/xop+xml"; start=""; start-info="text/xml"; .boundary="----=_Part_5_25686393.1183413571061"
Connection: close
Transfer-Encoding: chunked

1dce

------=_Part_5_25686393.1183413571061
Content-Type: application/xop+xml; charset=UTF-8; type="text/xml"
Content-Transfer-Encoding: 8bit
Content-ID: 
...

and looked like this when invoked against the production instance:

HTTP/1.1 200 OK
Date: Mon, 02 Jul 2007 21:41:56 GMT
Server: Apache/2.0.52 (Red Hat)
Vary: Accept-Encoding,User-Agent
Cache-Control: max-age=0
Expires: Mon, 02 Jul 2007 21:41:56 GMT
Transfer-Encoding: chunked
Content-Type: text/plain; charset=UTF-8
X-Pad: avoid browser bug

24e

------=_Part_29_31959705.1183412516805
Content-Type: application/xop+xml; charset=UTF-8; type="text/xml"
Content-Transfer-Encoding: 8bit
Content-ID: 
...

Notice the different content type returned by the production server? So then the mystery became not 'what?' but 'who?' I googled around for a bit and found a bug filed against JIRA that had all the same symptoms as the problem I was running into: the solution posted in the comments of the bug said that the problem was with mod_caucho. I worked with the ISP that hosts the production instance of Clearspace, got them to remove mod_caucho and use mod_proxy to isolate that piece of the puzzle and sure enough, the problem went away. Our ISP recommended that we not settle for mod_proxy for the entire site and instead wrote up a nifty solution using mod_rewrite and mod_proxy, which I've pasted below:

 RewriteRule ^/clearspace/rpc/soap(/?(.*))$ to://www.example.com:8080/clearspace/rpc/soap$1
 RewriteRule ^to://([^/]+)/(.*)    http://$1/$2   [E=SERVER:$1,P,L]
 ProxyPassReverse /community/rpc/soap/ http://www.example.com/clearspace/rpc/soap/

Hope that helps someone down the road!

Free Ticket to OSCON

Gotcha! Please excuse the corporate shilling for a second: the company I work for, Jive Software, is running a pretty cool promotion right now: write up a blog post about how you or your company is using Clearspace, Jive Forums, Openfire or Spark and if you’re blog post is selected as the most thorough, detailed, entertaining and well.. you get the idea. Bottom line: if you win, Jive will pay for either your airfare and hotel in Portland or will buy your ticket to OSCON (which is happening in only a couple weeks!). You can get the whole scoop over on the Jive Talks blog.

Links: 7-4-2007

Links: 7-2-2007

Data finds data

Jon Udell recently blogged about the way in which he ‘connected’ paths with a number of people or introduced acquaintances via blogging / publishing and bookmark sharing. One of the Important Takeway’sTM is that tagging, blogging and social bookmarking tools are a great way of saying, in a (hopefully) machine readable format, “what I’m thinking about” and “what I’m an expert in“.

At Jive Software, working on the Clearspace team, we’ve tried to make it easy to find experts, going so far as to add some really novel expertise searching and profiling. You can do a search for a person using a keyword and any number of profile fields against a specific space or community to find someone who might know the answer to the problem you’re facing. And that’s useful but I think the real value is going to come from the kind of thing that Jon mentions in his story: I’ll call it non-directed expert search. Imagine for a moment a giant faceless corporation with tall walls between departments: Don from the Widget team blogs about replicated and distributed data management on a regular basis. He’s busy (he just had another kid and doesn’t have time to go looking for experts in his field) but because he’s blogging and tagging in his regular course of work, Scott from the FooBar team might subscribe to a tag feed like ‘caching’ or ‘reliability’ and he’ll catch one of Don’s blog posts. And just by making it really easy to publish blog posts and bookmarks and by making said blog posts and bookmarks searchable, taggable, and syndicated, your employees and community members can connect without having to search for an expert, the experts present themselves, said using a quote from the aforementioned blog post:

…Data tends to finds data. And when it does, people find each other.