Save This Page
Home » lucene-2.3.2-src » org.apache » lucene » analysis » [javadoc | source]
org.apache.lucene.analysis
public final class: LowerCaseTokenizer [javadoc | source]
java.lang.Object
   org.apache.lucene.analysis.TokenStream
      org.apache.lucene.analysis.Tokenizer
         org.apache.lucene.analysis.CharTokenizer
            org.apache.lucene.analysis.LetterTokenizer
               org.apache.lucene.analysis.LowerCaseTokenizer
LowerCaseTokenizer performs the function of LetterTokenizer and LowerCaseFilter together. It divides text at non-letters and converts them to lower case. While it is functionally equivalent to the combination of LetterTokenizer and LowerCaseFilter, there is a performance advantage to doing the two tasks at once, hence this (redundant) implementation.

Note: this does a decent job for most European languages, but does a terrible job for some Asian languages, where words are not separated by spaces.
Fields inherited from org.apache.lucene.analysis.Tokenizer:
input
Constructor:
 public LowerCaseTokenizer(Reader in) 
    Construct a new LowerCaseTokenizer.
Method from org.apache.lucene.analysis.LowerCaseTokenizer Summary:
normalize
Methods from org.apache.lucene.analysis.LetterTokenizer:
isTokenChar
Methods from org.apache.lucene.analysis.CharTokenizer:
isTokenChar,   next,   normalize,   reset
Methods from org.apache.lucene.analysis.Tokenizer:
close,   reset
Methods from org.apache.lucene.analysis.TokenStream:
close,   next,   next,   reset
Methods from java.lang.Object:
equals,   getClass,   hashCode,   notify,   notifyAll,   toString,   wait,   wait,   wait
Method from org.apache.lucene.analysis.LowerCaseTokenizer Detail:
 protected char normalize(char c)