CTK (original) (raw)

CTK: Champollion Tool Kit

Built around LDC's champollion sentence aligner kernel, Champollion Tool Kit (CTK) aims to providing ready-to-use parallel text sentence alignment tools for as many language pairs as possible.

Champollion depends heavily on lexical information, but uses sentence length information as well. A translation lexicon is required. Past experiments indicate that champollion's performance improves as the translation lexicon become larger.

All source code was written in perl.

Your contribution is very welcomed, especially the following:

Please be aware that although champollion is designed for aligning noisy (deletions/insertions) parallel text, it's not capable of aligning comparable text.

What's New

[2005-08-25] CTK 1.1 released!

[2004-07-01] CTK 1.0 released!

Downloads

All software is available from:http://sourceforge.net/projects/champollion/.

All software is licensed under the OSI-approvedGNU General Public License. Please contact us if you would like the software under another license.

Available Language Pairs

Content of Distributions

Mailing Lists

Documentation

Minimum documentation is available in the README file in the distribution package. Full documentation is coming soon!

Research Papers

http://papers.ldc.upenn.edu/LREC2006/Champollion.pdf
Xiaoyi Ma
Champollion: A Robust Parallel Text Sentence Aligner
LREC 2006: Fifth International Conference on Language Resources and Evaluation, Genova, Italy, 2006

Linguistic Data Consortium

The CTK related efforts are based at the Linguistic Data Consortium at the University of Pennsylvania. The research and development of CTK is funded by TIDES Machine Translation Project.