TrUDucer: Transducer File Syntax

Short documentation of the Transducer file syntax used by the TrUDucer tool.

Comments

Line comments start with a #. There are no block comments.

expansions (lists)

You can define shorthands for groups of labels like this:
expansion predicatePos = [VAFIN, VVFIN, VVPP, VVINF, VVIZU, ADJD, ADJA];
They can then be used instead of the label.

Rule Syntax

One rule consists of one match-tree structure which is matched against each frontier subtree and a replacement tree structure with which the subtree is replaced if it matches the match-tree:
MATCHING_TREE -> REPLACEMENT_TREE;

tree structure

the tree structure is represented as treenode(subtreenode1, subtreenode2, ...) and can be arbitrarily nested. The subelements are unordered! Each element is a variable which has to occur on the right hand side. The replacement- and match-tree don't differ on this.

Dependency Relation label

A node can be restricted by its dependency label: n:PRED only matches nodes with the PRED label.

Part of Speech tag

You can also restrict matches by PoS: n.VVFIN:KON only matches nodes with PoS=VVFIN and dependency label=KON Variables starting with ? are catch-all variables that match all sub nodes not matched by other variables. They can contain an arbitrary number of nodes.

above/below frontier specification

In the match-tree you can match above the frontier (i.e. match already converted nodes) by explicitly setting the frontier with parentNotInFrontier({nodeInFrontier}, nodeOutOfFrontier): the "parent" node is already translated and the current frontier is below the parent node. If you don't declare the position of the frontier, it is assumed to be above the root of the left-hand-side.

variable dependency label

If you need to re-use a dependency label, you can match the label of a node with $var like this:
x:$label(....) -> y:$label
y will have the label x had before the translation. this usually only makes sense for already translated nodes.

groovy scripts

restrict rule matching

You can add groovy code to restrict matching based on the current tree and the resulting tree:
p({n:APP()}) -> p(n:compound()) :- {n.getOrd() < p.getOrd()};
will only be applied if n is left of p. The variables are cz.ufal.udapi.core.Node Objects. you can access the elements from the resulting tree by prefixing the variable name with an underscore, e.g. _n and _p. The rule is only applied if the groovy code returns true. You can change the data structures in groovy but be careful with that! One use case would be to mark nodes for further manual inspection.

startup and cleanup

With @startup {groovycode} and @cleanup {groovycode}, you can specify run-once scripts that will be executed once. The startup script will always be executed when the transducer is loaded. The cleanup script will only be executed after either a convall call over the command line or by converting via the session tool.

dictionary access

You can define and access lists in the groovy script which are shared over all script calls and scopes. For that you can access the dict object inside the groovy script: dict.zahlen = ["one", "two", "three"];
will set the list "zahlen" for all future script calls and it can be accessed for example with dict.zahlen.contains("three").
dict will act like any Map<String, List<String>> except that "put" will never return null; list.unknown_key will return an empty list instead.
For persistence accross multiple sessions, dict.writeToFile("fileName") and dict.readFromFile("filename") can be used in the @startup and @cleanup scripts.

links

social