SFST (original) (raw)
What is SFST?
SFST is a toolbox for the implementation of morphological analysers and other tools which are based on finite state transducer technology.
The SFST tools comprise
- a compiler which translates transducer programs into minimised transducers
- interactive and batch-mode analysis programs
- tools for comparing and printing transducers
- an efficient C++ transducer library
Features
- freely available under the GNU Public License
- easy to learn for users who are familiar with grep, sed, or Perl.
- efficient implementation in C++
- supports
- a wide range of transducer operations
- UTF-8 character coding
- weighted transducers (basic functionality only)
Downloads
- Source code of the SFST tools
- version 1.4.7g (only minor changes)
- version 1.4.7e (Empty lines in lexicon files are now ignored.)
- version 1.4.7d (fst-infl2 now allows you to print all analyses on a single line by specifying a new delimiter.)
- version 1.4.7b (the replacement operation now correctly works with alphabets that contain non-identity mappings, problems with incompatible alphabets solved in fst-parse)
- version 1.4.6j (downward replacement is now the exact opposite of upward replacement)
- version 1.4.6h (comments are now optionally allowed in the lexicon, faster fault-tolerant lookup)
- version 1.4.6a (Improvement of the efficiency of the minimisation and composition operations. Many thanks to Anssi Yli-Jyrä for his support!)
- version 1.4.4 (Bug related to multi-character symbols in the input was fixed.)
- version 1.4.3 (Optional replace operations have changed)
- version 1.4.2 (includes Hopcroft minimisation and other modifications which were jointly developed with the HFST team at Helsinki)
- version 1.3 (fst-print now produces a different output format which might affect the graphical viewers listed below)
- version 1.2
- A shortmanual (included in the source code package)
- A tutorialon the implementation of computational morphologies (included in the source code package)
- SMOR, a German finite-state morphology which is based on SFST.
- LatMor, a Latin finite-state morphology with vowel length information.
- EMOR, an English finite-state morphology using SFST.
- TRMOR, a Turkish finite-state morphology created by Ayla Kayabas and documented in this paper.
- mlmorph, a Malayalam finite-state morphology created by Santhosh Thottingal.
- yakutmorph, a Yakut finite-state morphology created by Nicolas Cortegoso Vissio.
- A Debian package for SFST (created by Francis Tyers)
- A Homebrew formula for installing SFST on Macs (contributed by Nathan Glenn)
- Python bindings for SFST focusing on transducer usage (contributed by Gregor Middell)
- SFST source code with Python bindings (repository created by Santhosh Thottingal)
- Software for finding potential errors in your SFST code (created by Eleonora Nagy)
Publications
Please cite the following publication if you want to refer to the SFST tools:
A Programming Language for Finite State Transducers,Proceedings of the 5th International Workshop on Finite State Methods in Natural Language Processing (FSMNLP 2005), Helsinki, Finland. (pdf)
Relations to other FST Toolkits
There are two projects which aim to extend the functionality of SFST in various ways:
- Anssi Yli-Jyrä's AFST toolkit is based on SFST
- The HFST tookit developed by Krister Lindén, Kimmo Koskenniemi, and colleagues was implemented on top of the three alternative FST libraries SFST, OpenFST, and foma. See also the contributions by other authors below.
Links
- Alex Linke provided
- an interfaceto the Graphviz tool for the graphical output of transducers.
- Stefan Evert also sent me aGraphviz converter.
- Matthias Kistler provided a highlighting mode for the VIM editor.
- Marius L. Jøhndal created a Ruby interface for the SFST library.
- UIMA wrapper for SFST (developed at the UKP Lab)
Please send comments, suggestions and bug reports to Helmut Schmid at LastName@cis.uni-muenchen.de. (Insert the name into the email address.)