Save This Page
Home » fop-0.95beta-src » org.apache » fop » layout » hyphenation » [javadoc | source]
org.apache.fop.layout.hyphenation
public class: HyphenationTree [javadoc | source]
java.lang.Object
   org.apache.fop.layout.hyphenation.TernaryTree
      org.apache.fop.layout.hyphenation.HyphenationTree

All Implemented Interfaces:
    Serializable, PatternConsumer, Cloneable

This tree structure stores the hyphenation patterns in an efficient way for fast lookup. It provides the provides the method to hyphenate a word.
Field Summary
protected  ByteVector vspace    value space: stores the inteletter values 
protected  HashMap stoplist    This map stores hyphenation exceptions 
protected  TernaryTree classmap    This map stores the character classes 
Fields inherited from org.apache.fop.layout.hyphenation.TernaryTree:
lo,  hi,  eq,  sc,  kv,  root,  freenode,  length,  BLOCK_SIZE
Constructor:
 public HyphenationTree() 
Method from org.apache.fop.layout.hyphenation.HyphenationTree Summary:
addClass,   addException,   addPattern,   findPattern,   getValues,   hstrcmp,   hyphenate,   hyphenate,   loadPatterns,   main,   packValues,   printStats,   searchPatterns,   unpackValues
Methods from org.apache.fop.layout.hyphenation.TernaryTree:
balance,   clone,   find,   find,   init,   insert,   insert,   insertBalanced,   keys,   knows,   main,   printStats,   size,   strcmp,   strcmp,   strcpy,   strlen,   strlen,   trimToSize
Methods from java.lang.Object:
equals,   getClass,   hashCode,   notify,   notifyAll,   toString,   wait,   wait,   wait
Method from org.apache.fop.layout.hyphenation.HyphenationTree Detail:
 public  void addClass(String chargroup) 
    Add a character class to the tree. It is used by PatternParser as callback to add character classes. Character classes define the valid word characters for hyphenation. If a word contains a character not defined in any of the classes, it is not hyphenated. It also defines a way to normalize the characters in order to compare them with the stored patterns. Usually pattern files use only lower case characters, in this case a class for letter 'a', for example, should be defined as "aA", the first character being the normalization char.
 public  void addException(String word,
    ArrayList hyphenatedword) 
    Add an exception to the tree. It is used by PatternParser class as callback to store the hyphenation exceptions.
 public  void addPattern(String pattern,
    String ivalue) 
    Add a pattern to the tree. Mainly, to be used by PatternParser class as callback to add a pattern to the tree.
 public String findPattern(String pat) 
 protected byte[] getValues(int k) 
 protected int hstrcmp(char[] s,
    int si,
    char[] t,
    int ti) 
    String compare, returns 0 if equal or t is a substring of s
 public Hyphenation hyphenate(String word,
    int remainCharCount,
    int pushCharCount) 
    Hyphenate word and return a Hyphenation object.
 public Hyphenation hyphenate(char[] w,
    int offset,
    int len,
    int remainCharCount,
    int pushCharCount) 
    Hyphenate word and return an array of hyphenation points.
 public  void loadPatterns(String filename) throws HyphenationException 
    Read hyphenation patterns from an XML file.
 public static  void main(String[] argv) throws Exception 
 protected int packValues(String values) 
    Packs the values by storing them in 4 bits, two values into a byte Values range is from 0 to 9. We use zero as terminator, so we'll add 1 to the value.
 public  void printStats() 
 protected  void searchPatterns(char[] word,
    int index,
    byte[] il) 

    Search for all possible partial matches of word starting at index an update interletter values. In other words, it does something like:

    for(i=0; i

    But it is done in an efficient way since the patterns are stored in a ternary tree. In fact, this is the whole purpose of having the tree: doing this search without having to test every single pattern. The number of patterns for languages such as English range from 4000 to 10000. Thus, doing thousands of string comparisons for each word to hyphenate would be really slow without the tree. The tradeoff is memory, but using a ternary tree instead of a trie, almost halves the the memory used by Lout or TeX. It's also faster than using a hash table

 protected String unpackValues(int k)