java.lang.Objectorg.apache.avalon.framework.logger.AbstractLogEnabled
org.apache.cocoon.xml.AbstractXMLProducer
org.apache.cocoon.xml.AbstractXMLPipe
org.apache.cocoon.transformation.AbstractTransformer
org.apache.cocoon.transformation.LuceneIndexTransformer
All Implemented Interfaces:
CacheableProcessingComponent, org.apache.avalon.framework.configuration.Configurable, org.apache.avalon.framework.context.Contextualizable, Transformer, XMLPipe, org.apache.avalon.excalibur.pool.Recyclable, XMLProducer
A lucene index creation transformer.
This transformer reads a document with elements in the namespace
http://apache.org/cocoon/lucene/1.0, and creates a new Lucene Index,
or updates an existing one.
It has several parameters which can be set in the sitemap component configuration or as parameters to the transformation step in the pipeline, or finally as attributes of the root element in the source XML document. The source document over-rides the transformation parameters, which in turn over-ride any configuration parameters.
Location of directory where index files are stored. This path is relative to the Cocoon work directory
This attribute controls whether the index is recreated.
If create = "false" and the index already exists then the index will be updated. Any documents which had already been indexed will be removed from the index and reinserted.
If the index does not exist then it will be created even if create="false".
If create="true" then any existing index will be destroyed and a new index created.
If you are rebuilding your entire index then you should set create="true" because the
indexer doesn't need to remove old documents from the index, so it will be faster.
Maximum number of terms to index in a field (as far as the index is concerned, the document will effectively be truncated at this point. The default value, 10k, may not be sufficient for large documents.
Class name of the Lucene text analyzer to use. Typically depends on the language of the text being indexed. See the Lucene documentation for more information.
<?xml version="1.0" encoding="UTF-8"?> <lucene:index xmlns:lucene="http://apache.org/cocoon/lucene/1.0" merge-factor="20" create="false" directory="index" max-field-length="10000" analyzer="org.apache.lucene.analysis.standard.StandardAnalyzer"> <lucene:document url="a.html"> <documentTitle lucene:store="true">Doggerel</documentTitle> <body>The quick brown fox jumped over the lazy dog</body> </lucene:document> <lucene:document url="b.html"> <documentTitle lucene:store="true">Lorem Ipsum</documentTitle> <body>Lorem ipsum dolor sit amet, consectetuer adipiscing elit.</body> <body>Nunc a mauris blandit ligula scelerisque tristique.</body> </lucene:document> </lucene:index>
< - a href="mailto:vgritsenko@apache.org">Vadim Gritsenko< - a href="mailto:conal@nzetc.org">Conal Tuohy$ - Id: LuceneIndexTransformer.java 433543 2006-08-22 06:22:54Z crossley $| Nested Class Summary: | ||
|---|---|---|
| static class | LuceneIndexTransformer.IndexHelperField | |
| static class | LuceneIndexTransformer.IndexerConfiguration | |
| Field Summary | ||
|---|---|---|
| public static final String | ANALYZER_CLASSNAME_CONFIG | |
| public static final String | ANALYZER_CLASSNAME_PARAMETER | |
| public static final String | ANALYZER_CLASSNAME_DEFAULT | |
| public static final String | DIRECTORY_CONFIG | |
| public static final String | DIRECTORY_PARAMETER | |
| public static final String | DIRECTORY_DEFAULT | |
| public static final String | MERGE_FACTOR_CONFIG | |
| public static final String | MERGE_FACTOR_PARAMETER | |
| public static final int | MERGE_FACTOR_DEFAULT | |
| public static final String | MAX_FIELD_LENGTH_CONFIG | |
| public static final String | MAX_FIELD_LENGTH_PARAMETER | |
| public static final int | MAX_FIELD_LENGTH_DEFAULT | |
| public static final String | LUCENE_URI | |
| public static final String | LUCENE_QUERY_ELEMENT | |
| public static final String | LUCENE_QUERY_ANALYZER_ATTRIBUTE | |
| public static final String | LUCENE_QUERY_DIRECTORY_ATTRIBUTE | |
| public static final String | LUCENE_QUERY_CREATE_ATTRIBUTE | |
| public static final String | LUCENE_QUERY_MERGE_FACTOR_ATTRIBUTE | |
| public static final String | LUCENE_QUERY_MAX_FIELD_LENGTH_ATTRIBUTE | |
| public static final String | LUCENE_DOCUMENT_ELEMENT | |
| public static final String | LUCENE_DOCUMENT_URL_ATTRIBUTE | |
| public static final String | LUCENE_ELEMENT_ATTR_TO_TEXT_ATTRIBUTE | |
| public static final String | LUCENE_ELEMENT_ATTR_STORE_VALUE | |
| public static final String | LUCENE_ELAPSED_TIME_ATTRIBUTE | |
| public static final String | CDATA | |
| protected File | workDir | |
| Fields inherited from org.apache.cocoon.xml.AbstractXMLProducer: |
|---|
| EMPTY_CONTENT_HANDLER, xmlConsumer, contentHandler, lexicalHandler |
| Method from org.apache.cocoon.transformation.LuceneIndexTransformer Summary: |
|---|
| characters, configure, contextualize, endDocument, endElement, endPrefixMapping, getKey, getValidity, recycle, setup, startDocument, startElement, startPrefixMapping |
| Methods from org.apache.cocoon.xml.AbstractXMLPipe: |
|---|
| characters, comment, endCDATA, endDTD, endDocument, endElement, endEntity, endPrefixMapping, ignorableWhitespace, processingInstruction, setDocumentLocator, skippedEntity, startCDATA, startDTD, startDocument, startElement, startEntity, startPrefixMapping |
| Methods from org.apache.cocoon.xml.AbstractXMLProducer: |
|---|
| recycle, setConsumer, setContentHandler, setLexicalHandler |
| Methods from java.lang.Object: |
|---|
| equals, getClass, hashCode, notify, notifyAll, toString, wait, wait, wait |
| Method from org.apache.cocoon.transformation.LuceneIndexTransformer Detail: |
|---|
|
|
|
|
|
|
|
|
|
<map:transform> element in the sitemap.
These parameters are optional:
If no parameters are specified here then the defaults are
supplied by the component configuration.
Any parameters specified here may be over-ridden by attributes
of the lucene:index element in the input document. |
|
|
|