java.lang.ObjectA RussianLetterTokenizer is a tokenizer that extends LetterTokenizer by additionally looking up letters in a given "russian charset". The problem with LeterTokenizer is that it uses Character.isLetter() method, which doesn't know how to detect letters in encodings like CP1252 and KOI8 (well-known problems with 0xD7 and 0xF7 chars)org.apache.lucene.analysis.TokenStream
org.apache.lucene.analysis.Tokenizer
org.apache.lucene.analysis.CharTokenizer
org.apache.lucene.analysis.ru.RussianLetterTokenizer
$ - Id: RussianLetterTokenizer.java 564236 2007-08-09 15:21:19Z gsingers $| Fields inherited from org.apache.lucene.analysis.Tokenizer: |
|---|
| input |
| Constructor: |
|---|
|
| Method from org.apache.lucene.analysis.ru.RussianLetterTokenizer Summary: |
|---|
| isTokenChar |
| Methods from org.apache.lucene.analysis.CharTokenizer: |
|---|
| isTokenChar, next, normalize, reset |
| Methods from org.apache.lucene.analysis.Tokenizer: |
|---|
| close, reset |
| Methods from org.apache.lucene.analysis.TokenStream: |
|---|
| close, next, next, reset |
| Methods from java.lang.Object: |
|---|
| equals, getClass, hashCode, notify, notifyAll, toString, wait, wait, wait |
| Method from org.apache.lucene.analysis.ru.RussianLetterTokenizer Detail: |
|---|
|