The Hamburg Dependency Treebank

The Hamburg Dependency Treebank is to our knowledge the largest dependency treebank currently available. It consists of genuine dependency annotations, i.e. they have not been transformed from phrase structures. The HDT is free for scientific/academic use.

The sentences were all sourced from the German news site heise.de, from articles published between 1996 and 2001. The content of the articles ranges from formulaic periodic updates on new BIOS revisions and processor models or quarterly earnings of tech companies over features about general trends in the hardware and software market to general coverage of social, legal and political issues in cyberspace, sometimes in the form of extensive weekly editorial comments. The mapping from sentences to articles and authors is retained, allowing, e.g. analysis of individual style. The creation of the treebank through manual annotation was largely interleaved with the creation of a standard for morphologically and syntactically annotating sentences as well as a constraint-based parser.

If you have questions regarding the HDT, send an email to hdt at informatik.uni-hamburg.de

annotated dependency tree
An example sentence annotated according to the HDT schema.

The HDT consists of three parts:

Download the HDT from the HZSK

UD conversion

There is a UD conversion to the HDT, performed by our TrUDucer tool. It has been be part of the UD releases since version 2.4 and can obtained from the UD_German-HDT GitHub repository. The dev branch contains the newest conversion. The conversion currently consists of nearly 3.4M tokens from parts A and B.

Publications

Software

links

social