Ted Pedersen - Free Software for Natural Language Processing from the NLP group at UMD (original) (raw)

[This page is out of date. Please contact me for more current info.]

This is a directory of software developed by the Natural Language Processing Group at the University of Minnesota, Duluth. It is mostly in Perl, and always freely available under the terms of the GNU General Public License (GPL). Many of these projects are available via CPAN and SourceForge.

Unsupervised Corpus Based Clustering of Similar Contexts

SenseClusters

SenseClusters is a package of Perl programs that allows a user to cluster similar contexts together using unsupervised knowledge-lean methods. These techniques have been applied to word sense discrimination, email categorization, and name discrimination.

Collocation Identification

Ngram Statistics Package (NSP)

NSP allows you to identify word n-grams in large corpora using standard tests of association such as Fisher's exact test, the log likelihood ratio, Pearson's chi-squared text, and the Dice Coefficient.

WordNet Resources

WordNet::Similarity

WordNet::Similarity allows you to measure the similarity and relatedness of two concepts in the WordNet lexical database using a variety of measures of semantic similarity and relatedness.

WordNet::SenseRelate

WordNet::SenseRelate allows you to assign meanings to each content word in a text. It does this by determining which sense of a word is most related to its neighbors.

WordNet Utilities

A few misc. programs that help us deal with WordNet.

UMLS Resources

UMLS::Similarity

UMLS::Similarity allows you to measure the similarity and relatedness of two concepts in the Unified Medical Language Subsystem (UMLS) using a variety of measures of semantic similarity and relatedness.

UMLS::Interface

UMLS::Interface provides a Perl interface to the Unified Medical Language System (UMLS) and provides much of the functionality that enables UMLS::Similarity.

Supervised Methods of Word Sense Disambiguation

SenseTools

This is a suite a tools that allow for easy creation of supervised word sense disambiguation experiments.

WSD Shell

This is a greatly improved version of the Duluth-Shell as used in the DuluthX Senseval-2 systems. It makes it easier to run large numbers of experiments, and provides many detailed reporting options.

SyntaLex

This extends the Duluth Senseval-2 systems with part of speech and syntactic features. This system participated in Senseval-3 (2004).

Duluth Senseval-3 Systems

Complete source code and documentation for the Duluth systems that participated in the Senseval-3 (2004) comparative exercise among word sense disambiguation systems. This includes supervised lexical sample systems based on the Duluth Senseval-2 systems, and a new unsupervised lexical sample system.

DuluthX Senseval-2 Systems

Complete source code and documentation for the Duluth systems that participated in the lexical sample tasks of Senseval-2 (2001) comparative exercise among word sense disambiguation systems. These systems rely on lexical features like unigrams, bigrams, and co-occurrences.

WSD Gate

This is a complete word sense disambiguation system that integrates NSP and Weka into the Gate environment.

CuiTools

This is a complete word sense disambiguation system that assigns senses to biomedical text based on the UMLS.

Data and Data Creation Tools

Senseval-2 Format Conversions

We support conversions of data in a number of formats into the Senseval-2 format for lexical sample word sense disambiguation. You can find those tools here!

Senseval-2 Formatted Data

We have converted a variety of sense-tagged text into the Senseval-2 format. We provide both copies of the converted data as well as the source code used to create it.

POS Tagging and Parsing Tools

Process Senseval-2 formatted data using the Brill POS Tagger and the Collins Parser.

Tools for Parallel Text

Tools for automatic and manual alignment of parallel text.

Web Mining

GoogleHack

GoogleHack finds sets of related words using the Google search engine.

By:Ted Pedersen- tpederse AT d umn edu