WordTokenizer (original) (raw)

java.lang.Object
- weka.core.tokenizers.Tokenizer
- - weka.core.tokenizers.CharacterDelimitedTokenizer
    - - weka.core.tokenizers.WordTokenizer
All Implemented Interfaces:
java.io.Serializable, java.util.Enumeration, OptionHandler, RevisionHandler

public class WordTokenizer
extends CharacterDelimitedTokenizer
A simple tokenizer that is using the java.util.StringTokenizer class to tokenize the strings.
Valid options are:
-delimiters
The delimiters to use
(default ' \r\n\t.,;:'"()?!').
Version: Revision:1.4Revision: 1.4 Revision:1.4
Author:
FracPete (fracpete at waikato dot ac dot nz)
See Also:
Serialized Form

Constructor Summary

Constructors

Constructor and Description
WordTokenizer()

Method Summary

All Methods Static Methods Instance Methods Concrete Methods

Modifier and Type	Method and Description
java.lang.String	getRevision() Returns the revision string.
java.lang.String	globalInfo() Returns a string describing the stemmer
boolean	hasMoreElements() Tests if this enumeration contains more elements.
static void	main(java.lang.String[] args) Runs the tokenizer with the given options and strings to tokenize.
java.lang.Object	nextElement() Returns the next element of this enumeration if this enumeration object has at least one more element to provide.
void	tokenize(java.lang.String s) Sets the string to tokenize.

   * ### Methods inherited from class weka.core.tokenizers.[CharacterDelimitedTokenizer](../../../weka/core/tokenizers/CharacterDelimitedTokenizer.html "class in weka.core.tokenizers")  
   `[delimitersTipText](../../../weka/core/tokenizers/CharacterDelimitedTokenizer.html#delimitersTipText--), [getDelimiters](../../../weka/core/tokenizers/CharacterDelimitedTokenizer.html#getDelimiters--), [getOptions](../../../weka/core/tokenizers/CharacterDelimitedTokenizer.html#getOptions--), [listOptions](../../../weka/core/tokenizers/CharacterDelimitedTokenizer.html#listOptions--), [setDelimiters](../../../weka/core/tokenizers/CharacterDelimitedTokenizer.html#setDelimiters-java.lang.String-), [setOptions](../../../weka/core/tokenizers/CharacterDelimitedTokenizer.html#setOptions-java.lang.String:A-)`  
   * ### Methods inherited from class weka.core.tokenizers.[Tokenizer](../../../weka/core/tokenizers/Tokenizer.html "class in weka.core.tokenizers")  
   `[runTokenizer](../../../weka/core/tokenizers/Tokenizer.html#runTokenizer-weka.core.tokenizers.Tokenizer-java.lang.String:A-), [tokenize](../../../weka/core/tokenizers/Tokenizer.html#tokenize-weka.core.tokenizers.Tokenizer-java.lang.String:A-)`  
   * ### Methods inherited from class java.lang.Object  
   `equals, getClass, hashCode, notify, notifyAll, toString, wait, wait, wait`

Constructor Detail

 * #### WordTokenizer  
 public WordTokenizer()

Method Detail

* #### globalInfo  
public java.lang.String globalInfo()  
Returns a string describing the stemmer  
Specified by:  
`[globalInfo](../../../weka/core/tokenizers/Tokenizer.html#globalInfo--)` in class `[Tokenizer](../../../weka/core/tokenizers/Tokenizer.html "class in weka.core.tokenizers")`  
Returns:  
a description suitable for displaying in the explorer/experimenter gui  
* #### hasMoreElements  
public boolean hasMoreElements()  
Tests if this enumeration contains more elements.  
Specified by:  
`hasMoreElements` in interface `java.util.Enumeration`  
Specified by:  
`[hasMoreElements](../../../weka/core/tokenizers/Tokenizer.html#hasMoreElements--)` in class `[Tokenizer](../../../weka/core/tokenizers/Tokenizer.html "class in weka.core.tokenizers")`  
Returns:  
true if and only if this enumeration object contains at least one more element to provide; false otherwise.  
* #### nextElement  
public java.lang.Object nextElement()  
Returns the next element of this enumeration if this enumeration object has at least one more element to provide.  
Specified by:  
`nextElement` in interface `java.util.Enumeration`  
Specified by:  
`[nextElement](../../../weka/core/tokenizers/Tokenizer.html#nextElement--)` in class `[Tokenizer](../../../weka/core/tokenizers/Tokenizer.html "class in weka.core.tokenizers")`  
Returns:  
the next element of this enumeration.  
* #### tokenize  
public void tokenize(java.lang.String s)  
Sets the string to tokenize. Tokenization happens immediately.  
Specified by:  
`[tokenize](../../../weka/core/tokenizers/Tokenizer.html#tokenize-java.lang.String-)` in class `[Tokenizer](../../../weka/core/tokenizers/Tokenizer.html "class in weka.core.tokenizers")`  
Parameters:  
`s` \- the string to tokenize  
* #### getRevision  
public java.lang.String getRevision()  
Returns the revision string.  
Returns:  
the revision  
* #### main  
public static void main(java.lang.String[] args)  
Runs the tokenizer with the given options and strings to tokenize. The tokens are printed to stdout.  
Parameters:  
`args` \- the commandline options and strings to tokenize

WordTokenizer (original) (raw)

Constructor Summary

Method Summary

Constructor Detail

Method Detail