Save This Page
Home » lucene-2.3.2-src » org.apache » lucene » analysis » [javadoc | source]
org.apache.lucene.analysis
abstract public class: CharTokenizer [javadoc | source]
java.lang.Object
   org.apache.lucene.analysis.TokenStream
      org.apache.lucene.analysis.Tokenizer
         org.apache.lucene.analysis.CharTokenizer

Direct Known Subclasses:
    RussianLetterTokenizer, WhitespaceTokenizer, LetterTokenizer, LowerCaseTokenizer

An abstract base class for simple, character-oriented tokenizers.
Fields inherited from org.apache.lucene.analysis.Tokenizer:
input
Constructor:
 public CharTokenizer(Reader input) 
Method from org.apache.lucene.analysis.CharTokenizer Summary:
isTokenChar,   next,   normalize,   reset
Methods from org.apache.lucene.analysis.Tokenizer:
close,   reset
Methods from org.apache.lucene.analysis.TokenStream:
close,   next,   next,   reset
Methods from java.lang.Object:
equals,   getClass,   hashCode,   notify,   notifyAll,   toString,   wait,   wait,   wait
Method from org.apache.lucene.analysis.CharTokenizer Detail:
 abstract protected boolean isTokenChar(char c)
    Returns true iff a character should be included in a token. This tokenizer generates as tokens adjacent sequences of characters which satisfy this predicate. Characters for which this is false are used to define token boundaries and are not included in tokens.
 public final Token next(Token token) throws IOException 
 protected char normalize(char c) 
    Called on each token character to normalize it before it is added to the token. The default implementation does nothing. Subclasses may use this to, e.g., lowercase tokens.
 public  void reset(Reader input) throws IOException