Save This Page
Home » lucene-2.3.2-src » org.apache » lucene » analysis » [javadoc | source]
org.apache.lucene.analysis
abstract public class: Tokenizer [javadoc | source]
java.lang.Object
   org.apache.lucene.analysis.TokenStream
      org.apache.lucene.analysis.Tokenizer

Direct Known Subclasses:
    TokenTypeSinkTokenizer, TokenRangeSinkTokenizer, NGramTokenizer, ChineseTokenizer, EdgeNGramTokenizer, RussianLetterTokenizer, WhitespaceTokenizer, LetterTokenizer, WikipediaTokenizer, DateRecognizerSinkTokenizer, StandardTokenizer, LowerCaseTokenizer, SinkTokenizer, KeywordTokenizer, CJKTokenizer, CharTokenizer

A Tokenizer is a TokenStream whose input is a Reader.

This is an abstract class.

NOTE: subclasses must override at least one of #next() or #next(Token) .

NOTE: subclasses overriding #next(Token) must call Token#clear() .
Field Summary
protected  Reader input    The text source for this Tokenizer. 
Constructor:
 protected Tokenizer() 
 protected Tokenizer(Reader input) 
    Construct a token stream processing the given input.
Method from org.apache.lucene.analysis.Tokenizer Summary:
close,   reset
Methods from org.apache.lucene.analysis.TokenStream:
close,   next,   next,   reset
Methods from java.lang.Object:
equals,   getClass,   hashCode,   notify,   notifyAll,   toString,   wait,   wait,   wait
Method from org.apache.lucene.analysis.Tokenizer Detail:
 public  void close() throws IOException 
    By default, closes the input Reader.
 public  void reset(Reader input) throws IOException 
    Expert: Reset the tokenizer to a new reader. Typically, an analyzer (in its reusableTokenStream method) will use this to re-use a previously created tokenizer.