Save This Page
Home » nutch-1.0 » org.apache.nutch » parse » [javadoc | source]
org.apache.nutch.parse
public interface: HtmlParseFilter [javadoc | source]

All Implemented Interfaces:
    org.apache.hadoop.conf.Configurable, Pluggable

All Known Implementing Classes:
    RelTagParser, JSParseFilter, CCParseFilter, HTMLLanguageParser

Extension point for DOM-based HTML parsers. Permits one to add additional metadata to HTML parses. All plugins found which implement this extension point are run sequentially on the parse.
Field Summary
static final  String X_POINT_ID    The name of the extension point. 
Method from org.apache.nutch.parse.HtmlParseFilter Summary:
filter
Method from org.apache.nutch.parse.HtmlParseFilter Detail:
 public ParseResult filter(Content content,
    ParseResult parseResult,
    HTMLMetaTags metaTags,
    DocumentFragment doc)
    Adds metadata or otherwise modifies a parse of HTML content, given the DOM tree of a page.