Home » nutch-1.0 » org.apache.nutch » crawl »

org.apache.nutch.crawl

Interfaces:

FetchSchedule   This interface defines the contract for implementations that manipulate fetch times and re-fetch intervals.  code | html

Abstract Classes:

AbstractFetchSchedule   This class provides common methods for implementations of FetchSchedule code | html
Signature   Licensed to the Apache Software Foundation (ASF) under one or more contributor license agreements.  code | html

Classes:

AdaptiveFetchSchedule   This class implements an adaptive re-fetch algorithm.  code | html
Crawl   Licensed to the Apache Software Foundation (ASF) under one or more contributor license agreements.  code | html
CrawlDatum   Licensed to the Apache Software Foundation (ASF) under one or more contributor license agreements.  code | html
CrawlDatum.Comparator   A Comparator optimized for CrawlDatum.  code | html
CrawlDb   This class takes the output of the fetcher and updates the crawldb accordingly.  code | html
CrawlDbFilter   This class provides a way to separate the URL normalization and filtering steps from the rest of CrawlDb manipulation code.  code | html
CrawlDbMerger   This tool merges several CrawlDb-s into one, optionally filtering URLs through the current URLFilters, to skip prohibited pages.  code | html
CrawlDbMerger.Merger     code | html
CrawlDbReader   Read utility for the CrawlDB.  code | html
CrawlDbReader.CrawlDatumCsvOutputFormat     code | html
CrawlDbReader.CrawlDatumCsvOutputFormat.LineRecordWriter     code | html
CrawlDbReader.CrawlDbStatCombiner     code | html
CrawlDbReader.CrawlDbStatMapper     code | html
CrawlDbReader.CrawlDbStatReducer     code | html
CrawlDbReader.CrawlDbTopNMapper     code | html
CrawlDbReader.CrawlDbTopNReducer     code | html
CrawlDbReducer   Merge new page entries with existing entries.  code | html
DefaultFetchSchedule   This class implements the default re-fetch schedule.  code | html
FetchScheduleFactory   Creates and caches a FetchSchedule implementation.  code | html
Generator   Generates a subset of a crawl db to fetch.  code | html
Generator.CrawlDbUpdater   Update the CrawlDB so that the next generate won't include the same URLs.  code | html
Generator.DecreasingFloatComparator     code | html
Generator.HashComparator   Sort fetch lists by hash of URL.  code | html
Generator.PartitionReducer     code | html
Generator.Selector   Selects entries due for fetch.  code | html
Generator.SelectorEntry     code | html
Generator.SelectorInverseMapper     code | html
Injector   This class takes a flat file of URLs and adds them to the of pages to be crawled.  code | html
Injector.InjectMapper   Normalize and filter injected urls.  code | html
Injector.InjectReducer   Combine multiple new entries for a url.  code | html
Inlink   Licensed to the Apache Software Foundation (ASF) under one or more contributor license agreements.  code | html
Inlinks   A list of Inlink s.  code | html
LinkDb   Maintains an inverted link map, listing incoming links for each url.  code | html
LinkDbFilter   This class provides a way to separate the URL normalization and filtering steps from the rest of LinkDb manipulation code.  code | html
LinkDbMerger   This tool merges several LinkDb-s into one, optionally filtering URLs through the current URLFilters, to skip prohibited URLs and links.  code | html
LinkDbReader   code | html
MD5Signature   Default implementation of a page signature.  code | html
MapWritable   A writable map, with a similar behavior as java.util.HashMap code | html
MapWritable.ClassIdEntry   container for Id class tuples  code | html
MapWritable.KeyValueEntry   an entry holds writable key and value  code | html
NutchWritable     code | html
PartitionUrlByHost   Partition urls by hostname.  code | html
SignatureComparator   Licensed to the Apache Software Foundation (ASF) under one or more contributor license agreements.  code | html
SignatureFactory   Factory class, which instantiates a Signature implementation according to the current Configuration configuration.  code | html
TextProfileSignature  

An implementation of a page signature. 

code | html
TextProfileSignature.Token     code | html
TextProfileSignature.TokenComparator     code | html