TrUDucer: Transducer File Syntax
Short documentation of the Transducer file syntax used by the TrUDucer tool.
Comments
Line comments start with a #. There are no block comments.
expansions (lists)
You can define shorthands for groups of labels like this:
expansion predicatePos = [VAFIN, VVFIN, VVPP, VVINF, VVIZU, ADJD, ADJA];
They can then be used instead of the label.
Rule Syntax
One rule consists of one match-tree structure which is matched against each frontier subtree and a replacement tree structure with which the subtree is replaced if it matches the match-tree:
MATCHING_TREE -> REPLACEMENT_TREE;
tree structure
the tree structure is represented as treenode(subtreenode1, subtreenode2, ...) and can be arbitrarily nested. The subelements are unordered! Each element is a variable which has to occur on the right hand side. The replacement- and match-tree don't differ on this.
Dependency Relation label
A node can be restricted by its dependency label: n:PRED
only matches nodes with the PRED label.
Part of Speech tag
You can also restrict matches by PoS:
n.VVFIN:KON
only matches nodes with PoS=VVFIN and dependency label=KON
Variables starting with ? are catch-all variables that match all sub nodes not matched by other variables. They can contain an arbitrary number of nodes.
above/below frontier specification
In the match-tree you can match above the frontier (i.e. match already converted nodes) by explicitly setting the frontier with parentNotInFrontier({nodeInFrontier}, nodeOutOfFrontier)
: the "parent" node is already translated and the current frontier is below the parent node. If you don't declare the position of the frontier, it is assumed to be above the root of the left-hand-side.
variable dependency label
If you need to re-use a dependency label, you can match the label of
a node with $var like this:
x:$label(....) -> y:$label
y will have the label x had before the translation. this usually
only makes sense for already translated nodes.
groovy scripts
restrict rule matching
You can add groovy code to restrict matching based on the current tree and the resulting tree:
p({n:APP()}) -> p(n:compound()) :- {n.getOrd() < p.getOrd()};
will only be applied if n is left of p. The variables are
cz.ufal.udapi.core.Node Objects. you can access the elements from the resulting tree by prefixing the variable name with an underscore, e.g. _n and _p. The rule is only applied if the groovy code returns true. You can change the data structures in groovy but be careful with that! One use case would be to mark nodes for further manual inspection.
startup and cleanup
With @startup {groovycode}
and @cleanup {groovycode}
, you can specify run-once scripts that will be executed once. The startup script will always be executed when the transducer is loaded. The cleanup script will only be executed after either a convall call over the command line or by converting via the session tool.
dictionary access
You can define and access lists in the groovy script which are shared over all script calls and scopes. For that you can access the dict
object inside the groovy script:
dict.zahlen = ["one", "two", "three"];
will set the list "zahlen" for all future script calls and it can be accessed for example with
dict.zahlen.contains("three")
.
dict will act like any Map<String, List<String>> except that "put" will never return null; list.unknown_key will return an empty list instead.
For persistence accross multiple sessions, dict.writeToFile("fileName")
and dict.readFromFile("filename")
can be used in the @startup and @cleanup scripts.