Docjar: A Java Source and Docuemnt Enginecom.*    java.*    javax.*    org.*    all    new    plug-in

Quick Search    Search Deep

recoinx.clef
Class CLEFAnalyzer  view CLEFAnalyzer download CLEFAnalyzer.java

java.lang.Object
  extended byrecoinx.clef.CLEFAnalyzer

public class CLEFAnalyzer
extends java.lang.Object

The CLEFAnalyzer class is the base class used for formatting, stemming, stop word removal, etc. It uses a SnowballAnalyzer for stemming and stopword removal and is capable of processing the languages English, German, French and Spanish.

The names of the stopword files must follow the naming convention

<LanguagePrefix>_stopwords.txt


to be considered for stopword removal.
Example: EN_stopwords.txt for the English stopword file. The language prefixes are: EN (English), DE (German), FR (French), ES (Spanish). The single stopwords in the files must appear separately on different lines.


Field Summary
private  SnowballAnalyzer englishAnalyzer
          A SnowballAnalyzer for English.
private  SnowballAnalyzer frenchAnalyzer
          A SnowballAnalyzer for French.
private  SnowballAnalyzer germanAnalyzer
          A SnowballAnalyzer for German.
(package private) static org.apache.log4j.Logger logger
          The logger of this class.
private  SnowballAnalyzer spanishAnalyzer
          A SnowballAnalyzer for Spanish.
(package private)  java.lang.String stopwordPath
          The path where the stopword files can be found.
 
Constructor Summary
CLEFAnalyzer(java.lang.String stopPath)
          Creates a new CLEFAnalyzer with the specified path to the stopword files.
 
Method Summary
 java.lang.String analyze(java.lang.String topic, int language)
          Performs stopword removal and stemming on the specified topic according to the specified language.
For the languages see CLEFConstants.
static java.lang.String[] createStopwords(java.io.File file)
          Creates a String[] of stopwords from the specified file.
static java.lang.String getAnalyzedString(SnowballAnalyzer analyzer, java.lang.String string)
          Parses the specified string and applies stemming using the specified SnowballAnalyzer.
 
Methods inherited from class java.lang.Object
clone, equals, finalize, getClass, hashCode, notify, notifyAll, toString, wait, wait, wait
 

Field Detail

logger

static org.apache.log4j.Logger logger
The logger of this class.


stopwordPath

java.lang.String stopwordPath
The path where the stopword files can be found.


germanAnalyzer

private SnowballAnalyzer germanAnalyzer
A SnowballAnalyzer for German.


spanishAnalyzer

private SnowballAnalyzer spanishAnalyzer
A SnowballAnalyzer for Spanish.


frenchAnalyzer

private SnowballAnalyzer frenchAnalyzer
A SnowballAnalyzer for French.


englishAnalyzer

private SnowballAnalyzer englishAnalyzer
A SnowballAnalyzer for English.

Constructor Detail

CLEFAnalyzer

public CLEFAnalyzer(java.lang.String stopPath)
Creates a new CLEFAnalyzer with the specified path to the stopword files. The CLEFAnalyzer will have four String[] of stopwords and four SnowballAnalyzers for the different languages respectively.

Method Detail

analyze

public java.lang.String analyze(java.lang.String topic,
                                int language)
Performs stopword removal and stemming on the specified topic according to the specified language.
For the languages see CLEFConstants.


getAnalyzedString

public static java.lang.String getAnalyzedString(SnowballAnalyzer analyzer,
                                                 java.lang.String string)
Parses the specified string and applies stemming using the specified SnowballAnalyzer.


createStopwords

public static java.lang.String[] createStopwords(java.io.File file)
Creates a String[] of stopwords from the specified file. Each stopword must appear on a separate line. If there are any errors reading the file, the returned list will be empty.