Tuesday 29 May 2007

Soot Datatypes and how they relate to each other

Soot is the underlying manipulation mechanism of DeepWeaver and while it is eventually hoped that programmers can be removed from having to think about Soot, we're not quite there yet. For now, its helpful to be aware of Soot's behaviour.

Soot breaks compiled Java code into an intermediate representation called Jimple, which represents the code body in terms of units. These units are 3 address statements which simplifies the number of possible instructions. So for example, the java statment:
x = foo + 3 * bar;
would have a Jimple representation that somewhat resembles
i0 = 3 * bar; x = foo + i0

This is important to understand because it illustrates DeepWeaver's awareness of the code. Since Deepweaver deals with bytecode it's unaware of your original statements, instead it sees this representation and thus the codeblocks may actually consist of a group of units, which are more atomic than the original statements they are derived from.

This then corresponds with a DeepWeaver heirarchy which wraps to the soot types. These are:
  • CodeValue, atoms such as constants, locals, etc...
  • Statements, corresponding with units above, of which the most important ones are assignment statements (illustrated above), if statements, return statements and return-void statements
  • CodeBlock, a set (list?) of statements
  • SootMethods, a callable method within the program. The method itself is a codeblock, but there are some extras to represent parameters etc.
  • SootClasses, corresponding to a Java class.

No comments: