Save This Page
Home » cocoon-2.1.11-src » org.apache » cocoon » transformation » [javadoc | source]
org.apache.cocoon.transformation
public class: LuceneIndexTransformer [javadoc | source]
java.lang.Object
   org.apache.avalon.framework.logger.AbstractLogEnabled
      org.apache.cocoon.xml.AbstractXMLProducer
         org.apache.cocoon.xml.AbstractXMLPipe
            org.apache.cocoon.transformation.AbstractTransformer
               org.apache.cocoon.transformation.LuceneIndexTransformer

All Implemented Interfaces:
    CacheableProcessingComponent, org.apache.avalon.framework.configuration.Configurable, org.apache.avalon.framework.context.Contextualizable, Transformer, XMLPipe, org.apache.avalon.excalibur.pool.Recyclable, XMLProducer

A lucene index creation transformer.

This transformer reads a document with elements in the namespace http://apache.org/cocoon/lucene/1.0, and creates a new Lucene Index, or updates an existing one.

It has several parameters which can be set in the sitemap component configuration or as parameters to the transformation step in the pipeline, or finally as attributes of the root element in the source XML document. The source document over-rides the transformation parameters, which in turn over-ride any configuration parameters.

directory

Location of directory where index files are stored. This path is relative to the Cocoon work directory

create

This attribute controls whether the index is recreated.

  • If create = "false" and the index already exists then the index will be updated. Any documents which had already been indexed will be removed from the index and reinserted.

  • If the index does not exist then it will be created even if create="false".

  • If create="true" then any existing index will be destroyed and a new index created. If you are rebuilding your entire index then you should set create="true" because the indexer doesn't need to remove old documents from the index, so it will be faster.

max-field-length

Maximum number of terms to index in a field (as far as the index is concerned, the document will effectively be truncated at this point. The default value, 10k, may not be sufficient for large documents.

analyzer

Class name of the Lucene text analyzer to use. Typically depends on the language of the text being indexed. See the Lucene documentation for more information.

merge-factor
Determines how often segment indices are merged. See the Lucene documentation for more information.
A simple example of the input:
<?xml version="1.0" encoding="UTF-8"?>
<lucene:index xmlns:lucene="http://apache.org/cocoon/lucene/1.0"
merge-factor="20"
create="false"
directory="index"
max-field-length="10000"
analyzer="org.apache.lucene.analysis.standard.StandardAnalyzer">
<lucene:document url="a.html">
<documentTitle lucene:store="true">Doggerel</documentTitle>
<body>The quick brown fox jumped over the lazy dog</body>
</lucene:document>
<lucene:document url="b.html">
<documentTitle lucene:store="true">Lorem Ipsum</documentTitle>
<body>Lorem ipsum dolor sit amet, consectetuer adipiscing elit.</body>
<body>Nunc a mauris blandit ligula scelerisque tristique.</body>
</lucene:document>
</lucene:index>
Nested Class Summary:
static class  LuceneIndexTransformer.IndexHelperField   
static class  LuceneIndexTransformer.IndexerConfiguration   
Field Summary
public static final  String ANALYZER_CLASSNAME_CONFIG     
public static final  String ANALYZER_CLASSNAME_PARAMETER     
public static final  String ANALYZER_CLASSNAME_DEFAULT     
public static final  String DIRECTORY_CONFIG     
public static final  String DIRECTORY_PARAMETER     
public static final  String DIRECTORY_DEFAULT     
public static final  String MERGE_FACTOR_CONFIG     
public static final  String MERGE_FACTOR_PARAMETER     
public static final  int MERGE_FACTOR_DEFAULT     
public static final  String MAX_FIELD_LENGTH_CONFIG     
public static final  String MAX_FIELD_LENGTH_PARAMETER     
public static final  int MAX_FIELD_LENGTH_DEFAULT     
public static final  String LUCENE_URI     
public static final  String LUCENE_QUERY_ELEMENT     
public static final  String LUCENE_QUERY_ANALYZER_ATTRIBUTE     
public static final  String LUCENE_QUERY_DIRECTORY_ATTRIBUTE     
public static final  String LUCENE_QUERY_CREATE_ATTRIBUTE     
public static final  String LUCENE_QUERY_MERGE_FACTOR_ATTRIBUTE     
public static final  String LUCENE_QUERY_MAX_FIELD_LENGTH_ATTRIBUTE     
public static final  String LUCENE_DOCUMENT_ELEMENT     
public static final  String LUCENE_DOCUMENT_URL_ATTRIBUTE     
public static final  String LUCENE_ELEMENT_ATTR_TO_TEXT_ATTRIBUTE     
public static final  String LUCENE_ELEMENT_ATTR_STORE_VALUE     
public static final  String LUCENE_ELAPSED_TIME_ATTRIBUTE     
public static final  String CDATA     
protected  File workDir     
Fields inherited from org.apache.cocoon.xml.AbstractXMLProducer:
EMPTY_CONTENT_HANDLER,  xmlConsumer,  contentHandler,  lexicalHandler
Method from org.apache.cocoon.transformation.LuceneIndexTransformer Summary:
characters,   configure,   contextualize,   endDocument,   endElement,   endPrefixMapping,   getKey,   getValidity,   recycle,   setup,   startDocument,   startElement,   startPrefixMapping
Methods from org.apache.cocoon.xml.AbstractXMLPipe:
characters,   comment,   endCDATA,   endDTD,   endDocument,   endElement,   endEntity,   endPrefixMapping,   ignorableWhitespace,   processingInstruction,   setDocumentLocator,   skippedEntity,   startCDATA,   startDTD,   startDocument,   startElement,   startEntity,   startPrefixMapping
Methods from org.apache.cocoon.xml.AbstractXMLProducer:
recycle,   setConsumer,   setContentHandler,   setLexicalHandler
Methods from java.lang.Object:
equals,   getClass,   hashCode,   notify,   notifyAll,   toString,   wait,   wait,   wait
Method from org.apache.cocoon.transformation.LuceneIndexTransformer Detail:
 public  void characters(char[] ch,
    int start,
    int length) throws SAXException 
 public  void configure(Configuration conf) throws ConfigurationException 
    Configure the transformer. The configuration parameters are stored as general defaults, which may be over-ridden by parameters specified as parameters in the sitemap pipeline, or by attributes of the query element(s) in the XML input document.
 public  void contextualize(Context context) throws ContextException 
    Contextualize this class
 public  void endDocument() throws SAXException 
 public  void endElement(String namespaceURI,
    String localName,
    String qName) throws SAXException 
 public  void endPrefixMapping(String prefix) throws SAXException 
    End the scope of a prefix-URI mapping.
 public Serializable getKey() 
    Generate the unique key. This key must be unique inside the space of this component.
 public SourceValidity getValidity() 
    Generate the validity object.
 public  void recycle() 
 public  void setup(SourceResolver resolver,
    Map objectModel,
    String src,
    Parameters parameters) throws IOException, SAXException, ProcessingException 
    Setup the transformer. Called when the pipeline is assembled. The parameters are those specified as child elements of the <map:transform> element in the sitemap. These parameters are optional: If no parameters are specified here then the defaults are supplied by the component configuration. Any parameters specified here may be over-ridden by attributes of the lucene:index element in the input document.
 public  void startDocument() throws SAXException 
 public  void startElement(String namespaceURI,
    String localName,
    String qName,
    Attributes atts) throws SAXException 
 public  void startPrefixMapping(String prefix,
    String uri) throws SAXException 
    Begin the scope of a prefix-URI Namespace mapping.