Docjar: A Java Source and Docuemnt Enginecom.*    java.*    javax.*    org.*    all    new    plug-in

Quick Search    Search Deep

org.apache.lucene.analysis.ru: Javadoc index of package org.apache.lucene.analysis.ru.


Package Samples:

org.apache.lucene.analysis.ru

Classes:

RussianCharsets: RussianCharsets class contains encodings schemes (charsets) and toLowerCase() method implementation for russian characters in Unicode, KOI8 and CP1252. Each encoding scheme contains lowercase (positions 0-31) and uppercase (position 32-63) characters. One should be able to add other encoding schemes (like ISO-8859-5 or customized) by adding a new charset and adding logic to toLowerCase() method for that charset.
RussianLetterTokenizer: A RussianLetterTokenizer is a tokenizer that extends LetterTokenizer by additionally looking up letters in a given "russian charset". The problem with LeterTokenizer is that it uses Character.isLetter() method, which doesn't know how to detect letters in encodings like CP1252 and KOI8 (well-known problems with 0xD7 and 0xF7 chars)
RussianStemFilter: A filter that stems Russian words. The implementation was inspired by GermanStemFilter. The input should be filtered by RussianLowerCaseFilter before passing it to RussianStemFilter , because RussianStemFilter only works with lowercase part of any "russian" charset.
RussianAnalyzer: Analyzer for Russian language. Supports an external list of stopwords (words that will not be indexed at all). A default set of stopwords is used unless an alternative list is specified.
RussianStemmer: Russian stemming algorithm implementation (see http://snowball.sourceforge.net for detailed description).
RussianLowerCaseFilter: Normalizes token text to lower case, analyzing given ("russian") charset.
TestRussianAnalyzer: Test case for RussianAnalyzer.
TestRussianStem

Home | Contact Us | Privacy Policy | Terms of Service