Cross site scripting: removing meta-characters from user-supplied data in CGI scripts using C#, Java and ASP

Ran into some issues with cross site scripting attacks today. CERT® has an excellent article that show exactly how you should be filtering input from forms. Specifically, it mentions that just filtering *certain* characters in user supplied input isn’t good enough. Developers should be doing the opposite and only explicitly allowing certain characters. Using

… this method, the programmer determines which characters should NOT be present in the user-supplied data and removes them. The problem with this approach is that it requires the programmer to predict all possible inputs that could possibly be misused. If the user uses input not predicted by the programmer, then there is the possibility that the script may be used in a manner not intended by the programmer.

They go on to show a examples of proper usage in both C and Perl, but who uses C and Perl? 😉 Here are the same examples in C#, Java and ASP.

In C#, you’ll make use of the Regex class, which lives in the System.Text.RegularExpressions namespace. I left out the import statements for succinctness here (you can download the entire class using the links at the end of this post), but you simply create a new Regex object supplying the regular expression pattern you want to look for as an argument to the constructor. In this case, the regular expression is looking for any characters not A-Z, a-z, 0-9, the ‘@’ sign, a period, an apostrophe, a space, an underscore or a dash. If it finds any characters not in that list, then it replaces them with an underscore.

public static String Filter(String userInput) {
  Regex re = new Regex("([^A-Za-z0-9@.' _-]+)");
  String filtered = re.Replace(userInput, "_");
  return filtered;
}

In Java it’s even easier. Java 1.4 has a regular expression package (which you can read about here) but you don’t even need to use it. The Java String class contains a couple methods that take a regular expression pattern as an argument. In this example I’m using the replaceAll(String regex, String replacement) method:

public static String Filter(String userInput) {
  String filtered = userInput.replaceAll("([^A-Za-z0-9@.' _-]+)", "_");
  return filtered;
}

Finally, in ASP (VBScript) you’d use the RegExp object in a function like this:

Function InputFilter(userInput)
  Dim newString, regEx
  Set regEx = New RegExp
  regEx.Pattern = "([^A-Za-z0-9@.' _-]+)"
  regEx.IgnoreCase = True
  regEx.Global = True
  newString = regEx.Replace(userInput, "")
  Set regEx = nothing
  InputFilter = newString
End Function

I think the next logical step would to be write a Servlet filter for Java that analyzes the request scope and automatically filters user input for you, much like the automatic request validation that happens in ASP.NET.

You can download the full code for each of the above examples here:

· InputFilter.cs
· InputFilter.java
· InputFilter.asp

Feel free to comment on the way that you do cross site scripting filtering.

6 thoughts on “Cross site scripting: removing meta-characters from user-supplied data in CGI scripts using C#, Java and ASP”

  1. Here’s a more compact version of the C# code:

    public static String Filter(String userInput) {
      return Regex.Replace(userInput, “([^A-Za-z0-9@.’ _-]+)”, “_”);
    }

    Though for best performance you should ideally keep the regex around between calls, so the pattern doesn’t need to be recompiled each time (I think this is true for all three languages).

  2. I know this is OLD, but eh, does anyone know how I can ALLOW the use of a FORWARD slash. / in the VB Script?

    I have tried ([^A-Za-z0-9@.’ _-/]+)”
    and
    ([^A-Za-z0-9@.’ _-]+/)”

    but neither seem to work?

Leave a Reply to Gchege Cancel reply

Your email address will not be published. Required fields are marked *