Tuesday 31 July 2007

Datatypes in DeepWeaver

DeepWeaver introduces limited type safety, as previously described.
To use types, you can use either built in types, or any other Java object by specifying it's full long-form name (eg. java.awt.AbstractListener)
If you're feeling really brave you can add your own datatype (although I've never seen a need to do so), just add it to the list in dw.type.Types and then add your own corresponding class file.

Here's a brief run-down of the built-in types:
  • String: Letters in double-quotes. Usually used for matching, printing or restriction
  • List: A Prolog style list, but shorthand list notation is not yet available
  • Stmt: Corresponds with the soot Unit type. A line or unit of code, which also retains information about the methods and values associated with it
  • Value: Corresponds with the soot Value type. A code value that also stores information about its declaring unit (but may be linked to by more than one unit!)
  • Block: A CodeBlock, or series of units, usually in some order that relates to their exection. In fact, it is a implemented as a chain of units, and the unit(cb,u) predicate can be used to split it into these component units. Also stores information about its parent method, and synchronization
  • Method: An analysis of an entire Method, including information of body, declaring class and synchronization.
  • Class: An analysis of an entire class, corresponding with soot.SootClass.

Tuesday 24 July 2007

Running DeepWeaver from the command line

Calling DeepWeaver from the command line is pretty similar to any other Java program, ie. not very pretty! The basic format is:
java -cp CLASSPATH dw.DW script.dw org.mytarget.Target

where
  • CLASSPATH contains all the JARs required by DeepWeaver (these may be contained in the $CLASSPATH environment variable if you prefer) - should be separated by a colon. For more info, run java -help
  • script.dw is the name of the predicate script you want to run
  • org.mytarget.Target is the name of the class file on which you want to run it, and you can specify as many targets as you like
Commonly used paramaters are
  • -javac "javac -cp ~/workspace/dw/build/classes"
    • Instructs the use custom java compiler parameters, which is nearly always necessary. In this case the java compiler will add my classes directory to its classpath.
  • -addcp /home/user/lib
    • Adds a custom class path for inclusion in Deepweaver execution. All the libraries you need, if they're not directly locatable from your current execution directory need to be added. This should point to either a root folder (eg. looking for org.apache.catalina.Role then it must be in /home/user/lib/org/apache/catalina/Role.class) or a JAR file. The easy way around this, if you have lots of JAR files is to create a library folder and just inflate all your JAR files there, that way you only ever have to specify one -addcp parameter
  • -time
    • displays timing information
  • -v
    • verbose output
You can also specify targets for execution en masse by using the + symbol on pacakages
+org.apache.catalina

Anything linked to code that is analysed will be loaded dynamically if it can be found.

Confused? Here's a sample command for Linux
java -cp jars/antlr-2.7.5.jar:jars/sootclasses-2534-1.jar:jars/jasminclasses-2327.jar:jars/polyglotclasses-1.3.2.jar:build/classes dw.DW -javac "javac -cp ${workspace_loc}/dw/build/classes" -addcp ~/workspace/webgoat_src/build/WEB-INF/classes/ -addcp ~/jars transact.dw +org.owasp.webgoat

This isn't as bad as it may seem!
  • Our classpath requires 4 DW jar files and the build/classes folder, which we've specified using -cp (Eclipse adds this automatically if you run using it).
  • -javac tells the java compiler to look inside /dw/build/classes for compile info
  • The -addcp parameters add the webgoat compiled classes and library of extracted JAR files to the path
  • transact.dw is the script to be called on the source code
  • We want to analyse all code within org.owasp.webgoat

Monday 23 July 2007

How to use the between predicate

The between predicate is an invaluable tool in your troubleshooting arsenal but it can also be quite tricky to use. The format of the predicate is:
between(a,z,x,b)

a and z should be code units that represent the start and end of the between block.
x is the code between a and z. If x is output, it will come in the form of one line of soot analysis at at time, in the form of units. If x is input, it should be a unit (or unit box) of one or more lines of code.
b is a boolean choice that selects between must (true) and may (false) analysis, eg. code in an if block between a and z may not be called between a and z and thus will be included in a may, but not a must analysis.

To use between to find the x variable, the most important thing is to ensure that you specify a and z as two non-equal code units. However, units are not necessarily the most instinctive way to specify a and z. For example, you may want to specify a or z in terms of a call. The wrong way to do this is:
between(call(<-,*,*), z, X, false)

This has a number of faults including:
  • There's nothing to stop the result of your call being the same as z (an arbitrary unit) which could give a null exception
  • Your input is likely to give multiple locations, which will give multiple results from between that may not be easily distinguishable from each other
Between is very powerful but you need to be firm about your input to it. Here is an example of how to use between to find all the code that may be between two method calls, one called begin, the other called commit.
getUnit(call(<-,p,*), y), getUnit(call(<-,q,*), z), methodMatches(p, "* begin(..)), methodMatches(q, "* commit(..)"), between(y, z, X, false )

This works because call's middle parameter returns a method. This method name can then be checked with methodMatches to ensure that it has an expected name. Since the two match parameters are different, this also ensures the two results won't be equal. Then we get the unit from the result so that we are sure between will be receiving a unit as both parameters. X is the result which may be multiple lines of code we can then test.

Thursday 19 July 2007

Predicate Specification

A couple of notes on how to specify your predicates in DeepWeaver, that are worth mentioning because they're different from how they're specified in Prolog.

When you're specifiying a predicate there are a number of reserved keywords, and an few restrictions.
name(fn,sn)=getName(fn,sn);
This is the basic standard specification, fn and sn are bound or unbound variables of any type.
name(in fn, out sn)=getName(fn,sn);
This restricts the standard specification, because fn must be bound and sn must be unbound when this predicate is called, otherwise it will be ignored. You could overload is by following it with the first example above.
name(String fn, out sn)=getName(fn,sn);
Now fn must be a String object, but may be bound or unbound, sn must still be unbound.

Points of note:
  • Unless your variable name is one char long, it should start with a small letter or it will be mistaken for a type binding
  • in and out should not be used as variable names
  • Predicates can be overloaded, but they are overloaded in order (same as Prolog), meaning that if you have a predicate which has no binding or type restrictions, it should be the final specified predicate
  • No error will be thrown if your bindings cause a predicate call that you did not expect