org.apache.lucene.analysis
abstract public class: Tokenizer [javadoc |
source]
java.lang.Object
org.apache.lucene.analysis.TokenStream
org.apache.lucene.analysis.Tokenizer
Direct Known Subclasses:
TokenTypeSinkTokenizer, TokenRangeSinkTokenizer, NGramTokenizer, ChineseTokenizer, EdgeNGramTokenizer, RussianLetterTokenizer, WhitespaceTokenizer, LetterTokenizer, WikipediaTokenizer, DateRecognizerSinkTokenizer, StandardTokenizer, LowerCaseTokenizer, SinkTokenizer, KeywordTokenizer, CJKTokenizer, CharTokenizer
A Tokenizer is a TokenStream whose input is a Reader.
This is an abstract class.
NOTE: subclasses must override at least one of #next() or #next(Token) .
NOTE: subclasses overriding #next(Token) must
call Token#clear() .
| Field Summary |
|---|
| protected Reader | input | The text source for this Tokenizer. |
| Method from org.apache.lucene.analysis.Tokenizer Summary: |
|---|
|
close, reset |
| Method from org.apache.lucene.analysis.Tokenizer Detail: |
public void close() throws IOException {
input.close();
}
By default, closes the input Reader. |
public void reset(Reader input) throws IOException {
this.input = input;
}
Expert: Reset the tokenizer to a new reader. Typically, an
analyzer (in its reusableTokenStream method) will use
this to re-use a previously created tokenizer. |