|
|||||||||
| Home >> All >> java >> [ text overview ] | PREV CLASS NEXT CLASS | ||||||||
SUMMARY: JAVADOC | SOURCE | DOWNLOAD | NESTED | FIELD | CONSTR | METHOD |
DETAIL: FIELD | CONSTR | METHOD | ||||||||
java.text
Class RuleBasedCollator

java.lang.Objectjava.text.Collator
java.text.RuleBasedCollator
- All Implemented Interfaces:
- java.lang.Cloneable, java.util.Comparator
- public class RuleBasedCollator
- extends Collator
This class is a concrete subclass of Collator suitable
for string collation in a wide variety of languages. An instance of
this class is normally returned by the getInstance method
of Collator with rules predefined for the requested
locale. However, an instance of this class can be created manually
with any desired rules.
Rules take the form of a String with the following syntax
- Modifier: '@'
- Relation: '<' | ';' | ',' | '=' : <text>
- Reset: '&' : <text>
- '<' - The text argument is greater than the prior term at the primary difference level.
- ';' - The text argument is greater than the prior term at the secondary difference level.
- ',' - The text argument is greater than the prior term at the tertiary difference level.
- '=' - The text argument is equal to the prior term
As for the text argument itself, this is any sequence of Unicode characters not in the following ranges: 0x0009-0x000D, 0x0020-0x002F, 0x003A-0x0040, 0x005B-0x0060, and 0x007B-0x007E. If these characters are desired, they must be enclosed in single quotes. If any whitespace is encountered, it is ignored. (For example, "a b" is equal to "ab").
The reset operation inserts the following rule at the point where the
text argument to it exists in the previously declared rule string. This
makes it easy to add new rules to an existing string by simply including
them in a reset sequence at the end. Note that the text argument, or
at least the first character of it, must be present somewhere in the
previously declared rules in order to be inserted properly. If this
is not satisfied, a ParseException will be thrown.
This system of configuring RuleBasedCollator is needlessly
complex and the people at Taligent who developed it (along with the folks
at Sun who accepted it into the Java standard library) deserve a slow
and agonizing death.
Here are a couple of example of rule strings:
"< a < b < c" - This string says that a is greater than b which is greater than c, with all differences being primary differences.
"< a,A < b,B < c,C" - This string says that 'A' is greater than 'a' with a tertiary strength comparison. Both 'b' and 'B' are greater than 'a' and 'A' during a primary strength comparison. But 'B' is greater than 'b' under a tertiary strength comparison.
"< a < c & a < b " - This sequence is identical in function to the "< a < b < c" rule string above. The '&' reset symbol indicates that the rule "< b" is to be inserted after the text argument "a" in the previous rule string segment.
"< a < b & y < z" - This is an error. The character 'y' does not appear anywhere in the previous rule string segment so the rule following the reset rule cannot be inserted.
"< a & A @ < e & E < f& F" - This sequence is equivalent to the following "< a & A < E & e < f & F".
For a description of the various comparison strength types, see the
documentation for the Collator class.
As an additional complication to this already overly complex rule scheme, if any characters precede the first rule, these characters are considered ignorable. They will be treated as if they did not exist during comparisons. For example, "- < a < b ..." would make '-' an ignorable character such that the strings "high-tech" and "hightech" would be considered identical.
A ParseException will be thrown for any of the following
conditions:
- Unquoted punctuation characters in a text argument.
- A relational or reset operator not followed by a text argument
- A reset operator where the text argument is not present in the previous rule string section.
| Nested Class Summary | |
(package private) static class |
RuleBasedCollator.CollationElement
This class describes what rank has a character (or a sequence of characters) in the lexicographic order. |
(package private) static class |
RuleBasedCollator.CollationSorter
Basic collation instruction (internal format) to build the series of collation elements. |
| Field Summary | |
private java.lang.Object[] |
ce_table
This is the table of collation element values |
private boolean |
inverseAccentComparison
This variable is true if accents need to be sorted in the other direction. |
private int |
last_primary_value
This is the value of the last sequence entered into ce_table. |
private int |
last_tertiary_value
This is the value of the last secondary sequence of the primary 0, entered into ce_table. |
(package private) java.util.HashMap |
prefix_tree
Quick-prefix finder. |
private java.lang.String |
rules
This the the original rule string. |
(package private) static RuleBasedCollator.CollationElement |
SPECIAL_UNKNOWN_SEQ
This collation element is special to unknown sequence. |
| Fields inherited from class java.text.Collator |
CANONICAL_DECOMPOSITION, decmp, FULL_DECOMPOSITION, IDENTICAL, NO_DECOMPOSITION, PRIMARY, SECONDARY, strength, TERTIARY |
| Constructor Summary | |
RuleBasedCollator(java.lang.String rules)
This method initializes a new instance of RuleBasedCollator
with the specified collation rules. |
|
| Method Summary | |
private void |
buildCollationVector(java.util.ArrayList parsedElements)
This method uses the sorting instructions built by parseString(java.lang.String) 55
to build collation elements which can be directly used to sort strings. |
private void |
buildPrefixAccess()
Build a tree where all keys are the texts of collation elements and data is the collation element itself. |
java.lang.Object |
clone()
This method creates a copy of this object. |
int |
compare(java.lang.String source,
java.lang.String target)
This method returns an integer which indicates whether the first specified String is less than, greater than, or equal to
the second. |
boolean |
equals(java.lang.Object obj)
This method tests this object for equality against the specified object. |
(package private) static int |
findPrefixLength(java.lang.String prefix,
java.lang.String s)
This method returns the number of common characters at the beginning of the string of the two parameters. |
CollationElementIterator |
getCollationElementIterator(CharacterIterator source)
This method returns an instance of CollationElementIterator
for the String represented by the specified
CharacterIterator. |
CollationElementIterator |
getCollationElementIterator(java.lang.String source)
This method returns an instance for CollationElementIterator
for the specified String under the collation rules for this
object. |
CollationKey |
getCollationKey(java.lang.String source)
This method returns an instance of CollationKey for the
specified String. |
(package private) RuleBasedCollator.CollationElement |
getDefaultAccentedElement(char c)
This method builds a default collation element for an accented character without invoking the database created from the rules passed to the constructor. |
(package private) RuleBasedCollator.CollationElement |
getDefaultElement(char c)
This method builds a default collation element without invoking the database created from the rules passed to the constructor. |
java.lang.String |
getRules()
This method returns a String containing the collation rules
for this object. |
int |
hashCode()
This method returns a hash value for this object. |
private void |
mergeRules(int offset,
java.lang.String starter,
java.util.ArrayList main,
java.util.ArrayList patch)
Here we are merging two sets of sorting instructions: 'patch' into 'main'. |
private java.util.ArrayList |
parseString(java.lang.String rules)
This method completely parses a string 'rules' containing sorting rules. |
private int |
subParseString(boolean stop_on_reset,
java.util.ArrayList v,
int base_offset,
java.lang.String rules)
This method parses a string and build a set of sorting instructions. |
| Methods inherited from class java.text.Collator |
compare, decomposeCharacter, equals, getAvailableLocales, getDecomposition, getInstance, getInstance, getStrength, setDecomposition, setStrength |
| Methods inherited from class java.lang.Object |
finalize, getClass, notify, notifyAll, toString, wait, wait, wait |
| Field Detail |
rules
private java.lang.String rules
- This the the original rule string.
ce_table
private java.lang.Object[] ce_table
- This is the table of collation element values
prefix_tree
java.util.HashMap prefix_tree
- Quick-prefix finder.
last_primary_value
private int last_primary_value
- This is the value of the last sequence entered into
ce_table. It is used to compute the ordering value of unspecified character.
last_tertiary_value
private int last_tertiary_value
- This is the value of the last secondary sequence of the
primary 0, entered into
ce_table. It is used to compute the ordering value of an unspecified accented character.
inverseAccentComparison
private boolean inverseAccentComparison
- This variable is true if accents need to be sorted
in the other direction.
SPECIAL_UNKNOWN_SEQ
static final RuleBasedCollator.CollationElement SPECIAL_UNKNOWN_SEQ
- This collation element is special to unknown sequence.
The JDK uses it to mark and sort the characters which has
no collation rules.
| Constructor Detail |
RuleBasedCollator
public RuleBasedCollator(java.lang.String rules) throws ParseException
- This method initializes a new instance of
RuleBasedCollatorwith the specified collation rules. Note that an application normally obtains an instance ofRuleBasedCollatorby calling thegetInstancemethod ofCollator. That method automatically loads the proper set of rules for the desired locale.
| Method Detail |
findPrefixLength
static int findPrefixLength(java.lang.String prefix, java.lang.String s)
- This method returns the number of common characters at the beginning
of the string of the two parameters.
mergeRules
private void mergeRules(int offset,
java.lang.String starter,
java.util.ArrayList main,
java.util.ArrayList patch)
throws ParseException
- Here we are merging two sets of sorting instructions: 'patch' into 'main'. This methods
checks whether it is possible to find an anchor point for the rules to be merged and
then insert them at that precise point.
subParseString
private int subParseString(boolean stop_on_reset,
java.util.ArrayList v,
int base_offset,
java.lang.String rules)
throws ParseException
- This method parses a string and build a set of sorting instructions. The parsing
may only be partial on the case the rules are to be merged sometime later.
clone
public java.lang.Object clone()
parseString
private java.util.ArrayList parseString(java.lang.String rules) throws ParseException
- This method completely parses a string 'rules' containing sorting rules.
buildCollationVector
private void buildCollationVector(java.util.ArrayList parsedElements) throws ParseException
- This method uses the sorting instructions built by
parseString(java.lang.String)55 to build collation elements which can be directly used to sort strings.
buildPrefixAccess
private void buildPrefixAccess()
- Build a tree where all keys are the texts of collation elements and data is
the collation element itself. The tree is used when extracting all prefix
for a given text.
compare
public int compare(java.lang.String source, java.lang.String target)
- This method returns an integer which indicates whether the first
specified
Stringis less than, greater than, or equal to the second. The value depends not only on the collation rules in effect, but also the strength and decomposition settings of this object.
equals
public boolean equals(java.lang.Object obj)
- This method tests this object for equality against the specified
object. This will be true if and only if the specified object is
another reference to this object.
- Specified by:
equalsin interfacejava.util.Comparator- Overrides:
equalsin classCollator
getDefaultElement
RuleBasedCollator.CollationElement getDefaultElement(char c)
- This method builds a default collation element without invoking
the database created from the rules passed to the constructor.
getDefaultAccentedElement
RuleBasedCollator.CollationElement getDefaultAccentedElement(char c)
- This method builds a default collation element for an accented character
without invoking the database created from the rules passed to the constructor.
getCollationElementIterator
public CollationElementIterator getCollationElementIterator(java.lang.String source)
- This method returns an instance for
CollationElementIteratorfor the specifiedStringunder the collation rules for this object.
getCollationElementIterator
public CollationElementIterator getCollationElementIterator(CharacterIterator source) throws gnu.classpath.NotImplementedException
- This method returns an instance of
CollationElementIteratorfor theStringrepresented by the specifiedCharacterIterator.
getCollationKey
public CollationKey getCollationKey(java.lang.String source)
- This method returns an instance of
CollationKeyfor the specifiedString. The object returned will have a more efficient mechanism for its comparison function that could provide speed benefits if multiple comparisons are performed, such as during a sort.- Specified by:
getCollationKeyin classCollator
getRules
public java.lang.String getRules()
- This method returns a
Stringcontaining the collation rules for this object.
hashCode
public int hashCode()
|
|||||||||
| Home >> All >> java >> [ text overview ] | PREV CLASS NEXT CLASS | ||||||||
SUMMARY: JAVADOC | SOURCE | DOWNLOAD | NESTED | FIELD | CONSTR | METHOD |
DETAIL: FIELD | CONSTR | METHOD | ||||||||
JAVADOC