org.apache.nutch.crawl
public class: MD5Signature [javadoc |
source]
java.lang.Object
org.apache.nutch.crawl.Signature
org.apache.nutch.crawl.MD5Signature
All Implemented Interfaces:
org.apache.hadoop.conf.Configurable
Default implementation of a page signature. It calculates an MD5 hash
of the raw binary content of a page. In case there is no content, it
calculates a hash from the page's URL.
- author:
Andrzej - Bialecki <ab@getopt.org>
| Method from org.apache.nutch.crawl.MD5Signature Summary: |
|---|
|
calculate |
| Method from org.apache.nutch.crawl.MD5Signature Detail: |
public byte[] calculate(Content content,
Parse parse) {
byte[] data = content.getContent();
if (data == null) data = content.getUrl().getBytes();
StringBuilder buf = new StringBuilder().append(data).append(parse.getText());
return MD5Hash.digest(buf.toString().getBytes()).getDigest();
}
|