WordTokenizer (original) (raw)
java.lang.Object
- weka.core.tokenizers.Tokenizer
- weka.core.tokenizers.CharacterDelimitedTokenizer
- weka.core.tokenizers.WordTokenizer
- weka.core.tokenizers.CharacterDelimitedTokenizer
All Implemented Interfaces:
java.io.Serializable, java.util.Enumeration, OptionHandler, RevisionHandler
public class WordTokenizer
extends CharacterDelimitedTokenizer
A simple tokenizer that is using the java.util.StringTokenizer class to tokenize the strings.
Valid options are:
-delimiters
The delimiters to use
(default ' \r\n\t.,;:'"()?!').
Version: Revision:1.4Revision: 1.4 Revision:1.4
Author:
FracPete (fracpete at waikato dot ac dot nz)
See Also:
Serialized Form
Constructor Summary
Constructors
Constructor and Description WordTokenizer() Method Summary
All Methods Static Methods Instance Methods Concrete Methods
Modifier and Type Method and Description java.lang.String getRevision() Returns the revision string. java.lang.String globalInfo() Returns a string describing the stemmer boolean hasMoreElements() Tests if this enumeration contains more elements. static void main(java.lang.String[] args) Runs the tokenizer with the given options and strings to tokenize. java.lang.Object nextElement() Returns the next element of this enumeration if this enumeration object has at least one more element to provide. void tokenize(java.lang.String s) Sets the string to tokenize. * ### Methods inherited from class weka.core.tokenizers.[CharacterDelimitedTokenizer](../../../weka/core/tokenizers/CharacterDelimitedTokenizer.html "class in weka.core.tokenizers") `[delimitersTipText](../../../weka/core/tokenizers/CharacterDelimitedTokenizer.html#delimitersTipText--), [getDelimiters](../../../weka/core/tokenizers/CharacterDelimitedTokenizer.html#getDelimiters--), [getOptions](../../../weka/core/tokenizers/CharacterDelimitedTokenizer.html#getOptions--), [listOptions](../../../weka/core/tokenizers/CharacterDelimitedTokenizer.html#listOptions--), [setDelimiters](../../../weka/core/tokenizers/CharacterDelimitedTokenizer.html#setDelimiters-java.lang.String-), [setOptions](../../../weka/core/tokenizers/CharacterDelimitedTokenizer.html#setOptions-java.lang.String:A-)` * ### Methods inherited from class weka.core.tokenizers.[Tokenizer](../../../weka/core/tokenizers/Tokenizer.html "class in weka.core.tokenizers") `[runTokenizer](../../../weka/core/tokenizers/Tokenizer.html#runTokenizer-weka.core.tokenizers.Tokenizer-java.lang.String:A-), [tokenize](../../../weka/core/tokenizers/Tokenizer.html#tokenize-weka.core.tokenizers.Tokenizer-java.lang.String:A-)` * ### Methods inherited from class java.lang.Object `equals, getClass, hashCode, notify, notifyAll, toString, wait, wait, wait`
Constructor Detail
* #### WordTokenizer public WordTokenizer()
Method Detail
* #### globalInfo public java.lang.String globalInfo() Returns a string describing the stemmer Specified by: `[globalInfo](../../../weka/core/tokenizers/Tokenizer.html#globalInfo--)` in class `[Tokenizer](../../../weka/core/tokenizers/Tokenizer.html "class in weka.core.tokenizers")` Returns: a description suitable for displaying in the explorer/experimenter gui * #### hasMoreElements public boolean hasMoreElements() Tests if this enumeration contains more elements. Specified by: `hasMoreElements` in interface `java.util.Enumeration` Specified by: `[hasMoreElements](../../../weka/core/tokenizers/Tokenizer.html#hasMoreElements--)` in class `[Tokenizer](../../../weka/core/tokenizers/Tokenizer.html "class in weka.core.tokenizers")` Returns: true if and only if this enumeration object contains at least one more element to provide; false otherwise. * #### nextElement public java.lang.Object nextElement() Returns the next element of this enumeration if this enumeration object has at least one more element to provide. Specified by: `nextElement` in interface `java.util.Enumeration` Specified by: `[nextElement](../../../weka/core/tokenizers/Tokenizer.html#nextElement--)` in class `[Tokenizer](../../../weka/core/tokenizers/Tokenizer.html "class in weka.core.tokenizers")` Returns: the next element of this enumeration. * #### tokenize public void tokenize(java.lang.String s) Sets the string to tokenize. Tokenization happens immediately. Specified by: `[tokenize](../../../weka/core/tokenizers/Tokenizer.html#tokenize-java.lang.String-)` in class `[Tokenizer](../../../weka/core/tokenizers/Tokenizer.html "class in weka.core.tokenizers")` Parameters: `s` \- the string to tokenize * #### getRevision public java.lang.String getRevision() Returns the revision string. Returns: the revision * #### main public static void main(java.lang.String[] args) Runs the tokenizer with the given options and strings to tokenize. The tokens are printed to stdout. Parameters: `args` \- the commandline options and strings to tokenize