The Stanford NLP Group (original) (raw)

The Stanford NLP Group makes some of our Natural Language Processing software available to everyone! We provide statistical NLP, deep learning NLP, and rule-based NLP tools for major computational linguistics problems, which can be incorporated into applications with human language technology needs. These packages are widely used in industry, academia, and government.

This code is actively being developed, and we try to answer questions and fix bugs on a best-effort basis.

All our supported software distributions are written in Java. Current versions of our software from October 2014 forward require Java 8+. (Versions from March 2013 to September 2014 required Java 1.6+; versions from 2005 to Feb 2013 required Java 1.5+. The Stanford Parser was first written in Java 1.1.) Distribution packages include components for command-line invocation, jar files, a Java API, and source code. You can also find us on GitHub and Maven. A number of helpful people have extended our work, with bindings or translations for other languages. As a result, much of this software can also easily be used from Python (or Jython), Ruby, Perl, Javascript, F#, and other .NET and JVM languages.

These software distributions are open source,licensed under theGNU General Public License (v3 or later for Stanford CoreNLP; v2 or later for the other releases). Note that this is the full GPL, which allows many free uses, but_does not allow_ its incorporation (even in part or in translation) into any type ofproprietary software which you distribute.Commercial licensing is also available; please contact us if you are interested.Bug fixes and code contributions are very welcome; see thecontributing page on our GitHub site.

Questions

Have a support question? Please ask us on Stack Overflowusing the tag stanford-nlp.

Feedback, questions, licensing issues, and bug reports / fixes can also be sent to our mailing lists (see immediately below).

Mailing Lists

We have 3 mailing lists forthis tool, all of which are shared with other JavaNLP tools (with the exclusion of the parser). Each address is at @lists.stanford.edu:

java-nlp-user This is the best list to post to in order to send feature requests, make announcements, or for discussion among JavaNLP users. (Please ask support questions onStack Overflow using thestanford-nlp tag.)
You have to subscribe to be able to use this list. Join the list via this webpage or by emailingjava-nlp-user-join@lists.stanford.edu. (Leave the subject and message body empty.) You can alsolook at the list archives.
java-nlp-announce This list will be used only to announce new versions of Stanford JavaNLP tools. So it will be very low volume (expect 2-4 messages a year). Join the list via this webpage or by emailingjava-nlp-announce-join@lists.stanford.edu. (Leave the subject and message body empty.)
java-nlp-support This list goes only to the software maintainers. It's a good address for licensing questions, etc. **For general use and support questions, you're better off using Stack Overflow or joining and using java-nlp-user.**You cannot join java-nlp-support, but you can mail questions tojava-nlp-support@lists.stanford.edu.

Core
Projects
Archive

Stanza

A Python natural language analysis package that provides implementations of fast neural network models for tokenization, multi-word token expansion, part-of-speech and morphological features tagging, lemmatization and dependency parsing using the Universal Dependencies formalism. Pretrained models are provided for more than 70 human languages. In addition, it is able to call the CoreNLP Java package and inherits additonal functionality from there, such as constituency parsing, coreference resolution, and linguistic pattern matching.

Stanford POS Tagger

A maximum-entropy (CMM) part-of-speech (POS) tagger for English, Arabic, Chinese, French, German, and Spanish, in Java.

Stanford Classifier

A machine learning classifier, with good feature templates for text categorization. Provides a softmax (a.k.a., maximum entropy or multiclass logistic regression) classifier, Naive Bayes, and other options.

Tregex, Tsurgeon, and Semgrex

Tools for matching patterns in linguistic trees (following the tgrep/tgrep2 tradition), a GUI for this, and a tree-transformation utility built on top of this matching language. Also, a similar utility for matching patterns in dependency graphs.

Stanford Neural Machine Translation

Latest research on neural machine translation (NMT) at Stanford NLP group. We release our codebase which produces state-of-the-art results in various translation tasks such as English-German and English-Czech. In addtion, to encourage reproducibility and increase transparency, we release our preprocessed data and pretrained models as well.

Stanford Natural Language Inference Corpus (SNLI)

The SNLI corpus is a collection of 570k human-written English sentence pairs manually labeled for balanced classification with the labels entailment, contradiction, and neutral, supporting the task of natural language inference (NLI), also known as recognizing textual entailment (RTE).

Phrasal

A state-of-the-art phrase-based machine translation system.

Topic Modeling Toolbox (TMT)

A suite of topic modeling tools for social scientists and others who wish to perform analysis on datasets that have a substantial textual component. Unfortunately, this software is no longer developed or supported.

Entailment-based MT Evaluation Software

Software to predict the adequacy of MT system output. The scoring is based in assessing the quality of entailment between the system output and the reference translation.