Save This Page
Home » nutch-1.0 » org.apache.nutch » crawl » [javadoc | source]
org.apache.nutch.crawl
public class: CrawlDatum [javadoc | source]
java.lang.Object
   org.apache.nutch.crawl.CrawlDatum

All Implemented Interfaces:
    Cloneable, WritableComparable

Licensed to the Apache Software Foundation (ASF) under one or more contributor license agreements. See the NOTICE file distributed with this work for additional information regarding copyright ownership. The ASF licenses this file to You under the Apache License, Version 2.0 (the "License"); you may not use this file except in compliance with the License. You may obtain a copy of the License at http://www.apache.org/licenses/LICENSE-2.0 Unless required by applicable law or agreed to in writing, software distributed under the License is distributed on an "AS IS" BASIS, WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied. See the License for the specific language governing permissions and limitations under the License.
Nested Class Summary:
public static class  CrawlDatum.Comparator  A Comparator optimized for CrawlDatum. 
Field Summary
public static final  String GENERATE_DIR_NAME     
public static final  String FETCH_DIR_NAME     
public static final  String PARSE_DIR_NAME     
public static final  byte STATUS_DB_UNFETCHED    Page was not fetched yet. 
public static final  byte STATUS_DB_FETCHED    Page was successfully fetched. 
public static final  byte STATUS_DB_GONE    Page no longer exists. 
public static final  byte STATUS_DB_REDIR_TEMP    Page temporarily redirects to other page. 
public static final  byte STATUS_DB_REDIR_PERM    Page permanently redirects to other page. 
public static final  byte STATUS_DB_NOTMODIFIED    Page was successfully fetched and found not modified. 
public static final  byte STATUS_DB_MAX    Maximum value of DB-related status. 
public static final  byte STATUS_FETCH_SUCCESS    Fetching was successful. 
public static final  byte STATUS_FETCH_RETRY    Fetching unsuccessful, needs to be retried (transient errors). 
public static final  byte STATUS_FETCH_REDIR_TEMP    Fetching temporarily redirected to other page. 
public static final  byte STATUS_FETCH_REDIR_PERM    Fetching permanently redirected to other page. 
public static final  byte STATUS_FETCH_GONE    Fetching unsuccessful - page is gone. 
public static final  byte STATUS_FETCH_NOTMODIFIED    Fetching successful - page is not modified. 
public static final  byte STATUS_FETCH_MAX    Maximum value of fetch-related status. 
public static final  byte STATUS_SIGNATURE    Page signature. 
public static final  byte STATUS_INJECTED    Page was newly injected. 
public static final  byte STATUS_LINKED    Page discovered through a link. 
public static final  HashMap statNames     
Constructor:
 public CrawlDatum() 
 public CrawlDatum(int status,
    int fetchInterval) 
 public CrawlDatum(int status,
    int fetchInterval,
    float score) 
Method from org.apache.nutch.crawl.CrawlDatum Summary:
clone,   compareTo,   equals,   getFetchInterval,   getFetchTime,   getMetaData,   getModifiedTime,   getRetriesSinceFetch,   getScore,   getSignature,   getStatus,   getStatusName,   hasDbStatus,   hasFetchStatus,   hashCode,   putAllMetaData,   read,   readFields,   set,   setFetchInterval,   setFetchInterval,   setFetchTime,   setMetaData,   setModifiedTime,   setRetriesSinceFetch,   setScore,   setSignature,   setStatus,   toString,   write
Methods from java.lang.Object:
equals,   getClass,   hashCode,   notify,   notifyAll,   toString,   wait,   wait,   wait
Method from org.apache.nutch.crawl.CrawlDatum Detail:
 public Object clone() 
 public int compareTo(CrawlDatum that) 
    Sort by decreasing score.
 public boolean equals(Object o) 
 public int getFetchInterval() 
 public long getFetchTime() 
    Returns either the time of the last fetch, or the next fetch time, depending on whether Fetcher or CrawlDbReducer set the time.
 public MapWritable getMetaData() 
    returns a MapWritable if it was set or read in @see readFields(DataInput), returns empty map in case CrawlDatum was freshly created (lazily instantiated).
 public long getModifiedTime() 
 public byte getRetriesSinceFetch() 
 public float getScore() 
 public byte[] getSignature() 
 public byte getStatus() 
 public static String getStatusName(byte value) 
 public static boolean hasDbStatus(CrawlDatum datum) 
 public static boolean hasFetchStatus(CrawlDatum datum) 
 public int hashCode() 
 public  void putAllMetaData(CrawlDatum other) 
    Add all metadata from other CrawlDatum to this CrawlDatum.
 public static CrawlDatum read(DataInput in) throws IOException 
 public  void readFields(DataInput in) throws IOException 
 public  void set(CrawlDatum that) 
    Copy the contents of another instance into this instance.
 public  void setFetchInterval(int fetchInterval) 
 public  void setFetchInterval(float fetchInterval) 
 public  void setFetchTime(long fetchTime) 
    Sets either the time of the last fetch or the next fetch time, depending on whether Fetcher or CrawlDbReducer set the time.
 public  void setMetaData(MapWritable mapWritable) 
 public  void setModifiedTime(long modifiedTime) 
 public  void setRetriesSinceFetch(int retries) 
 public  void setScore(float score) 
 public  void setSignature(byte[] signature) 
 public  void setStatus(int status) 
 public String toString() 
 public  void write(DataOutput out) throws IOException