Save This Page
Home » nutch-1.0 » org.apache.nutch » clustering » [javadoc | source]
org.apache.nutch.clustering
public interface: OnlineClusterer [javadoc | source]

All Implemented Interfaces:
    Pluggable

All Known Implementing Classes:
    Clusterer

An extension point interface for online search results clustering algorithms.

By the term online search results clustering we will understand a clusterer that works on a set of HitDetails retrieved for a query and able to produce a set of HitsCluster that can be displayed to help the user gain more insight in the topics found in the result.

Other clustering options include predefined categories and off-line preclustered groups, but I do not investigate those any further here.

Field Summary
public static final  String X_POINT_ID    The name of the extension point. 
Method from org.apache.nutch.clustering.OnlineClusterer Summary:
clusterHits
Method from org.apache.nutch.clustering.OnlineClusterer Detail:
 public HitsCluster[] clusterHits(HitDetails[] hitDetails,
    String[] descriptions)
    Clusters an array of hits (HitDetails objects) and their previously extracted summaries (Strings).

    Arguments to this method may seem to be very low-level, but in fact they are side products of a regular search process, so we simply reuse them instead of duplicating part of the usual Nutch functionality. Other ideas are welcome.

    This method must be thread-safe (many threads may invoke it concurrently on the same instance of a clusterer).