Docjar: A Java Source and Docuemnt Enginecom.*    java.*    javax.*    org.*    all    new    plug-in

Quick Search    Search Deep

java.text
Class RuleBasedCollator  view RuleBasedCollator download RuleBasedCollator.java

java.lang.Object
  extended byjava.text.Collator
      extended byjava.text.RuleBasedCollator
All Implemented Interfaces:
java.lang.Cloneable, java.util.Comparator

public class RuleBasedCollator
extends Collator

This class is a concrete subclass of Collator suitable for string collation in a wide variety of languages. An instance of this class is normally returned by the getInstance method of Collator with rules predefined for the requested locale. However, an instance of this class can be created manually with any desired rules.

Rules take the form of a String with the following syntax

The modifier character indicates that accents sort backward as is the case with French. The modifier applies to all rules after the modifier but before the next primary sequence. If placed at the end of the sequence if applies to all unknown accented character. The relational operators specify how the text argument relates to the previous term. The relation characters have the following meanings:

As for the text argument itself, this is any sequence of Unicode characters not in the following ranges: 0x0009-0x000D, 0x0020-0x002F, 0x003A-0x0040, 0x005B-0x0060, and 0x007B-0x007E. If these characters are desired, they must be enclosed in single quotes. If any whitespace is encountered, it is ignored. (For example, "a b" is equal to "ab").

The reset operation inserts the following rule at the point where the text argument to it exists in the previously declared rule string. This makes it easy to add new rules to an existing string by simply including them in a reset sequence at the end. Note that the text argument, or at least the first character of it, must be present somewhere in the previously declared rules in order to be inserted properly. If this is not satisfied, a ParseException will be thrown.

This system of configuring RuleBasedCollator is needlessly complex and the people at Taligent who developed it (along with the folks at Sun who accepted it into the Java standard library) deserve a slow and agonizing death.

Here are a couple of example of rule strings:

"< a < b < c" - This string says that a is greater than b which is greater than c, with all differences being primary differences.

"< a,A < b,B < c,C" - This string says that 'A' is greater than 'a' with a tertiary strength comparison. Both 'b' and 'B' are greater than 'a' and 'A' during a primary strength comparison. But 'B' is greater than 'b' under a tertiary strength comparison.

"< a < c & a < b " - This sequence is identical in function to the "< a < b < c" rule string above. The '&' reset symbol indicates that the rule "< b" is to be inserted after the text argument "a" in the previous rule string segment.

"< a < b & y < z" - This is an error. The character 'y' does not appear anywhere in the previous rule string segment so the rule following the reset rule cannot be inserted.

"< a & A @ < e & E < f& F" - This sequence is equivalent to the following "< a & A < E & e < f & F".

For a description of the various comparison strength types, see the documentation for the Collator class.

As an additional complication to this already overly complex rule scheme, if any characters precede the first rule, these characters are considered ignorable. They will be treated as if they did not exist during comparisons. For example, "- < a < b ..." would make '-' an ignorable character such that the strings "high-tech" and "hightech" would be considered identical.

A ParseException will be thrown for any of the following conditions:


Nested Class Summary
(package private) static class RuleBasedCollator.CollationElement
          This class describes what rank has a character (or a sequence of characters) in the lexicographic order.
(package private) static class RuleBasedCollator.CollationSorter
          Basic collation instruction (internal format) to build the series of collation elements.
 
Field Summary
private  java.lang.Object[] ce_table
          This is the table of collation element values
private  boolean inverseAccentComparison
          This variable is true if accents need to be sorted in the other direction.
private  int last_primary_value
          This is the value of the last sequence entered into ce_table.
private  int last_tertiary_value
          This is the value of the last secondary sequence of the primary 0, entered into ce_table.
(package private)  java.util.HashMap prefix_tree
          Quick-prefix finder.
private  java.lang.String rules
          This the the original rule string.
(package private) static RuleBasedCollator.CollationElement SPECIAL_UNKNOWN_SEQ
          This collation element is special to unknown sequence.
 
Fields inherited from class java.text.Collator
CANONICAL_DECOMPOSITION, decmp, FULL_DECOMPOSITION, IDENTICAL, NO_DECOMPOSITION, PRIMARY, SECONDARY, strength, TERTIARY
 
Constructor Summary
RuleBasedCollator(java.lang.String rules)
          This method initializes a new instance of RuleBasedCollator with the specified collation rules.
 
Method Summary
private  void buildCollationVector(java.util.ArrayList parsedElements)
          This method uses the sorting instructions built by parseString(java.lang.String) 55 to build collation elements which can be directly used to sort strings.
private  void buildPrefixAccess()
          Build a tree where all keys are the texts of collation elements and data is the collation element itself.
 java.lang.Object clone()
          This method creates a copy of this object.
 int compare(java.lang.String source, java.lang.String target)
          This method returns an integer which indicates whether the first specified String is less than, greater than, or equal to the second.
 boolean equals(java.lang.Object obj)
          This method tests this object for equality against the specified object.
(package private) static int findPrefixLength(java.lang.String prefix, java.lang.String s)
          This method returns the number of common characters at the beginning of the string of the two parameters.
 CollationElementIterator getCollationElementIterator(CharacterIterator source)
          This method returns an instance of CollationElementIterator for the String represented by the specified CharacterIterator.
 CollationElementIterator getCollationElementIterator(java.lang.String source)
          This method returns an instance for CollationElementIterator for the specified String under the collation rules for this object.
 CollationKey getCollationKey(java.lang.String source)
          This method returns an instance of CollationKey for the specified String.
(package private)  RuleBasedCollator.CollationElement getDefaultAccentedElement(char c)
          This method builds a default collation element for an accented character without invoking the database created from the rules passed to the constructor.
(package private)  RuleBasedCollator.CollationElement getDefaultElement(char c)
          This method builds a default collation element without invoking the database created from the rules passed to the constructor.
 java.lang.String getRules()
          This method returns a String containing the collation rules for this object.
 int hashCode()
          This method returns a hash value for this object.
private  void mergeRules(int offset, java.lang.String starter, java.util.ArrayList main, java.util.ArrayList patch)
          Here we are merging two sets of sorting instructions: 'patch' into 'main'.
private  java.util.ArrayList parseString(java.lang.String rules)
          This method completely parses a string 'rules' containing sorting rules.
private  int subParseString(boolean stop_on_reset, java.util.ArrayList v, int base_offset, java.lang.String rules)
          This method parses a string and build a set of sorting instructions.
 
Methods inherited from class java.text.Collator
compare, decomposeCharacter, equals, getAvailableLocales, getDecomposition, getInstance, getInstance, getStrength, setDecomposition, setStrength
 
Methods inherited from class java.lang.Object
finalize, getClass, notify, notifyAll, toString, wait, wait, wait
 

Field Detail

rules

private java.lang.String rules
This the the original rule string.


ce_table

private java.lang.Object[] ce_table
This is the table of collation element values


prefix_tree

java.util.HashMap prefix_tree
Quick-prefix finder.


last_primary_value

private int last_primary_value
This is the value of the last sequence entered into ce_table. It is used to compute the ordering value of unspecified character.


last_tertiary_value

private int last_tertiary_value
This is the value of the last secondary sequence of the primary 0, entered into ce_table. It is used to compute the ordering value of an unspecified accented character.


inverseAccentComparison

private boolean inverseAccentComparison
This variable is true if accents need to be sorted in the other direction.


SPECIAL_UNKNOWN_SEQ

static final RuleBasedCollator.CollationElement SPECIAL_UNKNOWN_SEQ
This collation element is special to unknown sequence. The JDK uses it to mark and sort the characters which has no collation rules.

Constructor Detail

RuleBasedCollator

public RuleBasedCollator(java.lang.String rules)
                  throws ParseException
This method initializes a new instance of RuleBasedCollator with the specified collation rules. Note that an application normally obtains an instance of RuleBasedCollator by calling the getInstance method of Collator. That method automatically loads the proper set of rules for the desired locale.

Method Detail

findPrefixLength

static int findPrefixLength(java.lang.String prefix,
                            java.lang.String s)
This method returns the number of common characters at the beginning of the string of the two parameters.


mergeRules

private void mergeRules(int offset,
                        java.lang.String starter,
                        java.util.ArrayList main,
                        java.util.ArrayList patch)
                 throws ParseException
Here we are merging two sets of sorting instructions: 'patch' into 'main'. This methods checks whether it is possible to find an anchor point for the rules to be merged and then insert them at that precise point.


subParseString

private int subParseString(boolean stop_on_reset,
                           java.util.ArrayList v,
                           int base_offset,
                           java.lang.String rules)
                    throws ParseException
This method parses a string and build a set of sorting instructions. The parsing may only be partial on the case the rules are to be merged sometime later.


clone

public java.lang.Object clone()
This method creates a copy of this object.

Overrides:
clone in class Collator

parseString

private java.util.ArrayList parseString(java.lang.String rules)
                                 throws ParseException
This method completely parses a string 'rules' containing sorting rules.


buildCollationVector

private void buildCollationVector(java.util.ArrayList parsedElements)
                           throws ParseException
This method uses the sorting instructions built by parseString(java.lang.String) 55 to build collation elements which can be directly used to sort strings.


buildPrefixAccess

private void buildPrefixAccess()
Build a tree where all keys are the texts of collation elements and data is the collation element itself. The tree is used when extracting all prefix for a given text.


compare

public int compare(java.lang.String source,
                   java.lang.String target)
This method returns an integer which indicates whether the first specified String is less than, greater than, or equal to the second. The value depends not only on the collation rules in effect, but also the strength and decomposition settings of this object.

Specified by:
compare in class Collator

equals

public boolean equals(java.lang.Object obj)
This method tests this object for equality against the specified object. This will be true if and only if the specified object is another reference to this object.

Specified by:
equals in interface java.util.Comparator
Overrides:
equals in class Collator

getDefaultElement

RuleBasedCollator.CollationElement getDefaultElement(char c)
This method builds a default collation element without invoking the database created from the rules passed to the constructor.


getDefaultAccentedElement

RuleBasedCollator.CollationElement getDefaultAccentedElement(char c)
This method builds a default collation element for an accented character without invoking the database created from the rules passed to the constructor.


getCollationElementIterator

public CollationElementIterator getCollationElementIterator(java.lang.String source)
This method returns an instance for CollationElementIterator for the specified String under the collation rules for this object.


getCollationElementIterator

public CollationElementIterator getCollationElementIterator(CharacterIterator source)
                                                     throws gnu.classpath.NotImplementedException
This method returns an instance of CollationElementIterator for the String represented by the specified CharacterIterator.


getCollationKey

public CollationKey getCollationKey(java.lang.String source)
This method returns an instance of CollationKey for the specified String. The object returned will have a more efficient mechanism for its comparison function that could provide speed benefits if multiple comparisons are performed, such as during a sort.

Specified by:
getCollationKey in class Collator

getRules

public java.lang.String getRules()
This method returns a String containing the collation rules for this object.


hashCode

public int hashCode()
This method returns a hash value for this object.

Specified by:
hashCode in class Collator