Save This Page
Home » apache-solr-1.3.0 » org.apache.solr » analysis » [javadoc | source]
org.apache.solr.analysis
public interface: TokenizerFactory [javadoc | source]

All Known Implementing Classes:
    NGramTokenizerFactory, LowerCaseTokenizerFactory, HTMLStripWhitespaceTokenizerFactory, StandardTokenizerFactory, WhitespaceTokenizerFactory, BaseTokenizerFactory, RussianLetterTokenizerFactory, KeywordTokenizerFactory, ChineseTokenizerFactory, HTMLStripStandardTokenizerFactory, PatternTokenizerFactory, LetterTokenizerFactory, CJKTokenizerFactory, EdgeNGramTokenizerFactory

A TokenizerFactory breaks up a stream of characters into tokens.

TokenizerFactories are registered for FieldTypes with the IndexSchema through the schema.xml file.

Example schema.xml entry to register a TokenizerFactory implementation to tokenize fields of type "cool"

<fieldtype name="cool" class="solr.TextField">
<analyzer>
<tokenizer class="solr.StandardTokenizerFactory"/>
...

A single instance of any registered TokenizerFactory is created via the default constructor and is reused for each FieldType.

Method from org.apache.solr.analysis.TokenizerFactory Summary:
create,   getArgs,   init
Method from org.apache.solr.analysis.TokenizerFactory Detail:
 public TokenStream create(Reader input)
    Creates a TokenStream of the specified input
 public Map getArgs()
    Accessor method for reporting the args used to initialize this factory.

    Implementations are strongly encouraged to return the contents of the Map passed to to the init method

 public  void init(Map args)
    init will be called just once, immediately after creation.

    The args are user-level initialization parameters that may be specified when declaring a the factory in the schema.xml