Save This Page
Home » nutch-1.0 » org.apache.nutch » scoring » [javadoc | source]
org.apache.nutch.scoring
public interface: ScoringFilter [javadoc | source]

All Implemented Interfaces:
    Pluggable, org.apache.hadoop.conf.Configurable

All Known Implementing Classes:
    OPICScoringFilter, TLDScoringFilter, ScoringFilters, LinkAnalysisScoringFilter

A contract defining behavior of scoring plugins. A scoring filter will manipulate scoring variables in CrawlDatum and in resulting search indexes. Filters can be chained in a specific order, to provide multi-stage scoring adjustments.
Field Summary
public static final  String X_POINT_ID    The name of the extension point. 
Method from org.apache.nutch.scoring.ScoringFilter Summary:
distributeScoreToOutlinks,   generatorSortValue,   indexerScore,   initialScore,   injectedScore,   passScoreAfterParsing,   passScoreBeforeParsing,   updateDbScore
Method from org.apache.nutch.scoring.ScoringFilter Detail:
 public CrawlDatum distributeScoreToOutlinks(Text fromUrl,
    ParseData parseData,
    Collection targets,
    CrawlDatum adjust,
    int allCount) throws ScoringFilterException
    Distribute score value from the current page to all its outlinked pages.
 public float generatorSortValue(Text url,
    CrawlDatum datum,
    float initSort) throws ScoringFilterException
    This method prepares a sort value for the purpose of sorting and selecting top N scoring pages during fetchlist generation.
 public float indexerScore(Text url,
    NutchDocument doc,
    CrawlDatum dbDatum,
    CrawlDatum fetchDatum,
    Parse parse,
    Inlinks inlinks,
    float initScore) throws ScoringFilterException
    This method calculates a Lucene document boost.
 public  void initialScore(Text url,
    CrawlDatum datum) throws ScoringFilterException
    Set an initial score for newly discovered pages. Note: newly discovered pages have at least one inlink with its score contribution, so filter implementations may choose to set initial score to zero (unknown value), and then the inlink score contribution will set the "real" value of the new page.
 public  void injectedScore(Text url,
    CrawlDatum datum) throws ScoringFilterException
    Set an initial score for newly injected pages. Note: newly injected pages may have no inlinks, so filter implementations may wish to set this score to a non-zero value, to give newly injected pages some initial credit.
 public  void passScoreAfterParsing(Text url,
    Content content,
    Parse parse) throws ScoringFilterException
    Currently a part of score distribution is performed using only data coming from the parsing process. We need this method in order to ensure the presence of score data in these steps.
 public  void passScoreBeforeParsing(Text url,
    CrawlDatum datum,
    Content content) throws ScoringFilterException
    This method takes all relevant score information from the current datum (coming from a generated fetchlist) and stores it into org.apache.nutch.protocol.Content metadata. This is needed in order to pass this value(s) to the mechanism that distributes it to outlinked pages.
 public  void updateDbScore(Text url,
    CrawlDatum old,
    CrawlDatum datum,
    List inlinked) throws ScoringFilterException
    This method calculates a new score of CrawlDatum during CrawlDb update, based on the initial value of the original CrawlDatum, and also score values contributed by inlinked pages.