Tuesday, 29 May 2007

Soot Datatypes and how they relate to each other

Soot is the underlying manipulation mechanism of DeepWeaver and while it is eventually hoped that programmers can be removed from having to think about Soot, we're not quite there yet. For now, its helpful to be aware of Soot's behaviour.

Soot breaks compiled Java code into an intermediate representation called Jimple, which represents the code body in terms of units. These units are 3 address statements which simplifies the number of possible instructions. So for example, the java statment:
x = foo + 3 * bar;
would have a Jimple representation that somewhat resembles
i0 = 3 * bar; x = foo + i0

This is important to understand because it illustrates DeepWeaver's awareness of the code. Since Deepweaver deals with bytecode it's unaware of your original statements, instead it sees this representation and thus the codeblocks may actually consist of a group of units, which are more atomic than the original statements they are derived from.

This then corresponds with a DeepWeaver heirarchy which wraps to the soot types. These are:
  • CodeValue, atoms such as constants, locals, etc...
  • Statements, corresponding with units above, of which the most important ones are assignment statements (illustrated above), if statements, return statements and return-void statements
  • CodeBlock, a set (list?) of statements
  • SootMethods, a callable method within the program. The method itself is a codeblock, but there are some extras to represent parameters etc.
  • SootClasses, corresponding to a Java class.

Monday, 28 May 2007

Modifications of standard Prolog

Deepweaver changes a number of standard Prolog syntactical features to bring it closer to Java, and make pattern cuts more familiar to Java developers

Declaring a predicate is no longer done using the Prolog iff term :-, instead the = symbol is used.
Terminating a predicate is no longer performed using the fullstop symbol . instead the semicolon is used ; Since the semicolon in Prolog represents or, this has now been replaced with the pipe character, |
Comments are now of standard Java form, so instead of % we have // for single line comments and /* .. */ for multiline comments
Negation is slightly different from standard Prolog in that \+ is no longer used to reperesent negation, instead use the not(A) predicate.

Finally, its worth noting that in a .dw file, only predicates preceeded by a question mark will be executed. One should declare all the required predicates and then select the one to be executed by writing ?runMe(A,B); at the end of the file.

eg. well_located(A,B) = statement(A,X), not(between(s, X, B, true));

Wednesday, 23 May 2007

Predicate Library Summary

A summary of the built-in DeepWeaver predicates and their expected functions.
  • allBoxIter(A)
  • args(A,B) A call to method A has a list of argument objects B
  • assign(A,B,L) Variable A is assigned value B at location L
  • between(a,z,X,B) X is a codeblock between ground/atomic codeblocks a and z if boolean B is true. If B is false, then X may or may not be between a and z.
  • body(a,B) Finds a method body location B in an atom or ground variable a, the code block location may be in only non-abstract, real method bodys which should be included in a.
  • call(A,B,C) Instruction A calls method B which takes some sort of object C.
  • class(A) A is a class. Returns any class if A is unbound
  • defBoxes(a,B) Finds a definition box B within unit a
  • doms(A,B,C)
  • element(A,N,B)
  • end(a,B) B is the soot-generated final piece of code of method a. See start.
  • forall(A,B) Standard Prolog forall, for all cases of A, B is true.
  • jump(A) Returns a statment which causes a jump in the code
  • target(A,L) Returns the Location of a jump in the code at statement A
  • method(A,B)
  • methodMatches(A,B)
  • member(A,B) Standard Prolog member, returns true if A is an element of list B
  • name(A,B)
  • not(X) Standard Prolog negation, if X cannot be proved, this statement returns true.
  • parentMethod(a,B) Gets the entire parent method of atom or ground variable a as a codeblock bound to B.
  • path(A,B,C)
  • precedes(A,B) Code block A occurs at or before Code block B in a method (or logical Soot code flow)
  • pred(A,B)
  • print(A) Prints the type and toString() result of A
  • sameValue(A,B)
  • sootmethod(A,B)
  • start(a,B) B is the soot-generated first code piece of method a. Note that this is soot generated, so the start block includes all code that comes between the first method call, loop block, or other non-trivial piece of code.
  • statement(A,B) A is the location of the statement that results in B.
  • succ(A,B) Same as pred(B,A)
  • type(A,a) Restricts A to be of the same type as a.
  • units(A,B)
  • unused(A)
  • useboxes(A,B)
  • uses(A,L) Code block A is used at location L.
  • value(A,B) B is the value of variable A, returned as a CodeValue. Common usage: value(A,X), value(B,X).

Insertion of return values

Prolog variables don't have return values, but often there's a case where a function only works one way, meaning that an unbound variable essentially acts as a return value. While this detracts from the declarative power of Prolog, it certainly reduces the number of temporary variables hanging around programs.

So, to shorten the pattern cut implementations and reduce the number of temporary variables we use the notation <-
eg. We can rewrite bar(X), foo(X,c) as foo(bar(<-),c)

In useful terms, one nice rewrite we could have is

statement(Ta,CallA), statement(Tb, CallB), precedes(Ta,Tb)
rewrites to a much shorter and intuitve
precedes(statement(<-,CallA), statement(<-,CallB))

More extreme examples that certainly cut down on the number of inserted variables but potentially at the cost of legibility could be
dominates(a,b) = parentMethod(a,Ta), body(Ta,Tb), start(Tb, Tc), between(Tc,b,a,true)
which shortens to
dominates(a,b) = between(start(body(parentMethod(a,<-),<-),<-),b,a,true)

Its the type of predicate that's handy to have in your arsenal but its good to be aware of its impact on readability