SRILM - an extensible language modeling toolkit (original) (raw)

SRILM is a collection of C++ libraries, executable programs, and helper scripts designed to allow both production of and experimen- tation with statistical language models for speech recognition and other applications. SRILM is freely available for noncommercial purposes. The toolkit supports creation and evaluation of a vari- ety of language model types based on N-gram statistics, as well as several related tasks, such as statistical tagging and manipu- lation of N-best lists and word lattices. This paper summarizes the functionality of the toolkit and discusses its design and imple- mentation, highlighting ease of rapid prototyping, reusability, and combinability of tools.