org.apache.lucene.analysis.ru
public class: RussianCharsets [javadoc |
source]
java.lang.Object
org.apache.lucene.analysis.ru.RussianCharsets
RussianCharsets class contains encodings schemes (charsets) and toLowerCase() method implementation
for russian characters in Unicode, KOI8 and CP1252.
Each encoding scheme contains lowercase (positions 0-31) and uppercase (position 32-63) characters.
One should be able to add other encoding schemes (like ISO-8859-5 or customized) by adding a new charset
and adding logic to toLowerCase() method for that charset.
- version:
$ - Id: RussianCharsets.java 564236 2007-08-09 15:21:19Z gsingers $
| Field Summary |
|---|
| public static char[] | UnicodeRussian | |
| public static char[] | KOI8 | |
| public static char[] | CP1251 | |
| Method from org.apache.lucene.analysis.ru.RussianCharsets Summary: |
|---|
|
toLowerCase |
| Method from org.apache.lucene.analysis.ru.RussianCharsets Detail: |
public static char toLowerCase(char letter,
char[] charset) {
if (charset == UnicodeRussian)
{
if (letter >= '\u0430" && letter < = '\u044F")
{
return letter;
}
if (letter >= '\u0410" && letter < = '\u042F")
{
return (char) (letter + 32);
}
}
if (charset == KOI8)
{
if (letter >= 0xe0 && letter < = 0xff)
{
return (char) (letter - 32);
}
if (letter >= 0xc0 && letter < = 0xdf)
{
return letter;
}
}
if (charset == CP1251)
{
if (letter >= 0xC0 && letter < = 0xDF)
{
return (char) (letter + 32);
}
if (letter >= 0xE0 && letter < = 0xFF)
{
return letter;
}
}
return Character.toLowerCase(letter);
}
|