Save This Page
Home » nutch-1.0 » org.apache.nutch » analysis » [javadoc | source]
org.apache.nutch.analysis
public class: NutchDocumentAnalyzer [javadoc | source]
java.lang.Object
   org.apache.lucene.analysis.Analyzer
      org.apache.nutch.analysis.NutchAnalyzer
         org.apache.nutch.analysis.NutchDocumentAnalyzer

All Implemented Interfaces:
    Pluggable, org.apache.hadoop.conf.Configurable

The analyzer used for Nutch documents. Uses the JavaCC-defined lexical analyzer NutchDocumentTokenizer , with no stop list. This keeps it consistent with query parsing.
Field Summary
public static final  int INTER_ANCHOR_GAP    The number of unused term positions between anchors in the anchor field. 
Fields inherited from org.apache.nutch.analysis.NutchAnalyzer:
X_POINT_ID,  conf
Constructor:
 public NutchDocumentAnalyzer(Configuration conf) 
    Parameters:
    conf -
Method from org.apache.nutch.analysis.NutchDocumentAnalyzer Summary:
tokenStream
Methods from org.apache.nutch.analysis.NutchAnalyzer:
getConf,   setConf,   tokenStream
Methods from org.apache.lucene.analysis.Analyzer:
getPositionIncrementGap,   reusableTokenStream,   tokenStream
Methods from java.lang.Object:
equals,   getClass,   hashCode,   notify,   notifyAll,   toString,   wait,   wait,   wait
Method from org.apache.nutch.analysis.NutchDocumentAnalyzer Detail:
 public TokenStream tokenStream(String fieldName,
    Reader reader) 
    Returns a new token stream for text from the named field.