Save This Page
Home » nutch-1.0 » org.apache.nutch » protocol » [javadoc | source]
org.apache.nutch.protocol
public interface: Protocol [javadoc | source]

All Implemented Interfaces:
    org.apache.hadoop.conf.Configurable, Pluggable

All Known Implementing Classes:
    File, HttpBase, Http, Ftp, Http

A retriever of url content. Implemented by protocol extensions.
Field Summary
public static final  String X_POINT_ID    The name of the extension point. 
public static final  String CHECK_BLOCKING    Property name. If in the current configuration this property is set to true, protocol implementations should handle "politeness" limits internally. If this is set to false, it is assumed that these limits are enforced elsewhere, and protocol implementations should not enforce them internally. 
public static final  String CHECK_ROBOTS    Property name. If in the current configuration this property is set to true, protocol implementations should handle robot exclusion rules internally. If this is set to false, it is assumed that these limits are enforced elsewhere, and protocol implementations should not enforce them internally. 
Method from org.apache.nutch.protocol.Protocol Summary:
getProtocolOutput,   getRobotRules
Method from org.apache.nutch.protocol.Protocol Detail:
 public ProtocolOutput getProtocolOutput(Text url,
    CrawlDatum datum)
    Returns the Content for a fetchlist entry.
 public RobotRules getRobotRules(Text url,
    CrawlDatum datum)
    Retrieve robot rules applicable for this url.