Save This Page
Home » lucene-2.3.2-src » org.apache » lucene » analysis » ru » [javadoc | source]
org.apache.lucene.analysis.ru
public class: RussianLetterTokenizer [javadoc | source]
java.lang.Object
   org.apache.lucene.analysis.TokenStream
      org.apache.lucene.analysis.Tokenizer
         org.apache.lucene.analysis.CharTokenizer
            org.apache.lucene.analysis.ru.RussianLetterTokenizer
A RussianLetterTokenizer is a tokenizer that extends LetterTokenizer by additionally looking up letters in a given "russian charset". The problem with LeterTokenizer is that it uses Character.isLetter() method, which doesn't know how to detect letters in encodings like CP1252 and KOI8 (well-known problems with 0xD7 and 0xF7 chars)
Fields inherited from org.apache.lucene.analysis.Tokenizer:
input
Constructor:
 public RussianLetterTokenizer(Reader in,
    char[] charset) 
Method from org.apache.lucene.analysis.ru.RussianLetterTokenizer Summary:
isTokenChar
Methods from org.apache.lucene.analysis.CharTokenizer:
isTokenChar,   next,   normalize,   reset
Methods from org.apache.lucene.analysis.Tokenizer:
close,   reset
Methods from org.apache.lucene.analysis.TokenStream:
close,   next,   next,   reset
Methods from java.lang.Object:
equals,   getClass,   hashCode,   notify,   notifyAll,   toString,   wait,   wait,   wait
Method from org.apache.lucene.analysis.ru.RussianLetterTokenizer Detail:
 protected boolean isTokenChar(char c)