Save This Page
Home » nutch-1.0 » org.apache.nutch » indexer » [javadoc | source]
org.apache.nutch.indexer
public interface: IndexingFilter [javadoc | source]

All Implemented Interfaces:
    org.apache.hadoop.conf.Configurable, Pluggable

All Known Implementing Classes:
    RelTagIndexingFilter, SubcollectionIndexingFilter, MoreIndexingFilter, LanguageIndexingFilter, BasicIndexingFilter, AnchorIndexingFilter, CCIndexingFilter, FeedIndexingFilter, TLDIndexingFilter

Extension point for indexing. Permits one to add metadata to the indexed fields. All plugins found which implement this extension point are run sequentially on the parse.
Field Summary
static final  String X_POINT_ID    The name of the extension point. 
Method from org.apache.nutch.indexer.IndexingFilter Summary:
addIndexBackendOptions,   filter
Method from org.apache.nutch.indexer.IndexingFilter Detail:
 public  void addIndexBackendOptions(Configuration conf)
    Adds index-level configuraition options. Implementations can update given configuration to pass document-independent information to indexing backends. As a rule of thumb, prefix meta keys with the name of the backend intended. For example, when passing information to lucene backend, prefix keys with "lucene.".
 public NutchDocument filter(NutchDocument doc,
    Parse parse,
    Text url,
    CrawlDatum datum,
    Inlinks inlinks) throws IndexingException
    Adds fields or otherwise modifies the document that will be indexed for a parse. Unwanted documents can be removed from indexing by returning a null value.