The CCG site (original) (raw)
CCGbank is a translation of the Penn Treebankinto a corpus of Combinatory Categorial Grammar derivations, created by Julia Hockenmaier and Mark Steedman. You can get it here from the Linguistic Data Consortium. You can also have a look at this demo of the HTML version included in the LDC distribution.
CCGbank pairs syntactic derivations with sets of word-word dependencies which approximate the underlying predicate-argument structure. The translation process and linguistic analyses are explained in themanual.
CCGbank contains 99.44% of the sentences in the Penn Treebank, for which it corrects a number of inconsistencies and errors in the original annotation.
The LDC distribution also contains machine-readable versions of the data, which contain the syntactic derivations and the corresponding lists of word-word dependencies, as well as a file that is searchable by Doug Rohde's TGrep2 (version 1.15).
In all versions, the file structure corresponds exactly to that of the original Treebank.