org.apache.nutch.indexer
public interface: IndexingFilter [javadoc |
source]
All Implemented Interfaces:
org.apache.hadoop.conf.Configurable, Pluggable
All Known Implementing Classes:
RelTagIndexingFilter, SubcollectionIndexingFilter, MoreIndexingFilter, LanguageIndexingFilter, BasicIndexingFilter, AnchorIndexingFilter, CCIndexingFilter, FeedIndexingFilter, TLDIndexingFilter
Extension point for indexing. Permits one to add metadata to the indexed
fields. All plugins found which implement this extension point are run
sequentially on the parse.
| Field Summary |
|---|
| static final String | X_POINT_ID | The name of the extension point. |
| Method from org.apache.nutch.indexer.IndexingFilter Detail: |
public void addIndexBackendOptions(Configuration conf)
Adds index-level configuraition options.
Implementations can update given configuration to pass document-independent
information to indexing backends. As a rule of thumb, prefix meta keys
with the name of the backend intended. For example, when
passing information to lucene backend, prefix keys with "lucene.". |
public NutchDocument filter(NutchDocument doc,
Parse parse,
Text url,
CrawlDatum datum,
Inlinks inlinks) throws IndexingException
Adds fields or otherwise modifies the document that will be indexed for a
parse. Unwanted documents can be removed from indexing by returning a null value. |