| Method from org.apache.nutch.scoring.ScoringFilter Detail: |
public CrawlDatum distributeScoreToOutlinks(Text fromUrl,
ParseData parseData,
Collection targets,
CrawlDatum adjust,
int allCount) throws ScoringFilterException
Distribute score value from the current page to all its outlinked pages. |
public float generatorSortValue(Text url,
CrawlDatum datum,
float initSort) throws ScoringFilterException
This method prepares a sort value for the purpose of sorting and
selecting top N scoring pages during fetchlist generation. |
public float indexerScore(Text url,
NutchDocument doc,
CrawlDatum dbDatum,
CrawlDatum fetchDatum,
Parse parse,
Inlinks inlinks,
float initScore) throws ScoringFilterException
This method calculates a Lucene document boost. |
public void initialScore(Text url,
CrawlDatum datum) throws ScoringFilterException
Set an initial score for newly discovered pages. Note: newly discovered pages
have at least one inlink with its score contribution, so filter implementations
may choose to set initial score to zero (unknown value), and then the inlink
score contribution will set the "real" value of the new page. |
public void injectedScore(Text url,
CrawlDatum datum) throws ScoringFilterException
Set an initial score for newly injected pages. Note: newly injected pages
may have no inlinks, so filter implementations may wish to set this
score to a non-zero value, to give newly injected pages some initial
credit. |
public void passScoreAfterParsing(Text url,
Content content,
Parse parse) throws ScoringFilterException
Currently a part of score distribution is performed using only data coming
from the parsing process. We need this method in order to ensure the
presence of score data in these steps. |
public void passScoreBeforeParsing(Text url,
CrawlDatum datum,
Content content) throws ScoringFilterException
This method takes all relevant score information from the current datum
(coming from a generated fetchlist) and stores it into
org.apache.nutch.protocol.Content metadata.
This is needed in order to pass this value(s) to the mechanism that distributes it
to outlinked pages. |
public void updateDbScore(Text url,
CrawlDatum old,
CrawlDatum datum,
List inlinked) throws ScoringFilterException
This method calculates a new score of CrawlDatum during CrawlDb update, based on the
initial value of the original CrawlDatum, and also score values contributed by
inlinked pages. |