|
|||||||||
| Home >> All >> recoinx >> [ clef overview ] | PREV CLASS NEXT CLASS | ||||||||
SUMMARY: JAVADOC | SOURCE | DOWNLOAD | NESTED | FIELD | CONSTR | METHOD |
DETAIL: FIELD | CONSTR | METHOD | ||||||||
recoinx.clef
Class CLEFAnalyzer

java.lang.Objectrecoinx.clef.CLEFAnalyzer
- public class CLEFAnalyzer
- extends java.lang.Object
The CLEFAnalyzer class is the base class used for formatting, stemming, stop
word removal, etc. It uses a SnowballAnalyzer for stemming and stopword removal
and is capable of processing the languages English, German, French and Spanish.
The names of the stopword files must follow the naming convention
to be considered for stopword removal.
Example: EN_stopwords.txt for the English stopword file. The language prefixes are: EN (English), DE (German), FR (French), ES (Spanish). The single stopwords in the files must appear separately on different lines.
| Field Summary | |
private SnowballAnalyzer |
englishAnalyzer
A SnowballAnalyzer for English. |
private SnowballAnalyzer |
frenchAnalyzer
A SnowballAnalyzer for French. |
private SnowballAnalyzer |
germanAnalyzer
A SnowballAnalyzer for German. |
(package private) static org.apache.log4j.Logger |
logger
The logger of this class. |
private SnowballAnalyzer |
spanishAnalyzer
A SnowballAnalyzer for Spanish. |
(package private) java.lang.String |
stopwordPath
The path where the stopword files can be found. |
| Constructor Summary | |
CLEFAnalyzer(java.lang.String stopPath)
Creates a new CLEFAnalyzer with the specified path to the stopword files. |
|
| Method Summary | |
java.lang.String |
analyze(java.lang.String topic,
int language)
Performs stopword removal and stemming on the specified topic according to the specified language. For the languages see CLEFConstants. |
static java.lang.String[] |
createStopwords(java.io.File file)
Creates a String[] of stopwords from the specified file. |
static java.lang.String |
getAnalyzedString(SnowballAnalyzer analyzer,
java.lang.String string)
Parses the specified string and applies stemming using the specified SnowballAnalyzer. |
| Methods inherited from class java.lang.Object |
clone, equals, finalize, getClass, hashCode, notify, notifyAll, toString, wait, wait, wait |
| Field Detail |
logger
static org.apache.log4j.Logger logger
- The logger of this class.
stopwordPath
java.lang.String stopwordPath
- The path where the stopword files can be found.
germanAnalyzer
private SnowballAnalyzer germanAnalyzer
- A SnowballAnalyzer for German.
spanishAnalyzer
private SnowballAnalyzer spanishAnalyzer
- A SnowballAnalyzer for Spanish.
frenchAnalyzer
private SnowballAnalyzer frenchAnalyzer
- A SnowballAnalyzer for French.
englishAnalyzer
private SnowballAnalyzer englishAnalyzer
- A SnowballAnalyzer for English.
| Constructor Detail |
CLEFAnalyzer
public CLEFAnalyzer(java.lang.String stopPath)
- Creates a new CLEFAnalyzer with the specified path to the stopword files. The
CLEFAnalyzer will have four String[] of stopwords and four SnowballAnalyzers
for the different languages respectively.
| Method Detail |
analyze
public java.lang.String analyze(java.lang.String topic, int language)
- Performs stopword removal and stemming on the specified topic according to the
specified language.
For the languages see CLEFConstants.
getAnalyzedString
public static java.lang.String getAnalyzedString(SnowballAnalyzer analyzer, java.lang.String string)
- Parses the specified string and applies stemming using the specified
SnowballAnalyzer.
createStopwords
public static java.lang.String[] createStopwords(java.io.File file)
- Creates a String[] of stopwords from the specified file. Each stopword must
appear on a separate line. If there are any errors reading the file, the
returned list will be empty.
|
|||||||||
| Home >> All >> recoinx >> [ clef overview ] | PREV CLASS NEXT CLASS | ||||||||
SUMMARY: JAVADOC | SOURCE | DOWNLOAD | NESTED | FIELD | CONSTR | METHOD |
DETAIL: FIELD | CONSTR | METHOD | ||||||||
JAVADOC
recoinx.clef.CLEFAnalyzer